继续啃算法，今天来看并查集。并查集是一种数据结构，可以跟踪一组元素，它们分布在几个不相交（非重叠）子集合中。它也被称为不相交集数据结构。这是什么意思呢？例如，并查集数据结构可以计算当前输入分成几个不相交的组织，两个子节点是否位于同一组织内。

画成图就是这样的：右图的 union 表示两个集合合并后，更改一个节点的父节点(且左子树的父节点都要更改)；左图中的 find 表示绿色节点在找根节点是谁，并修改这条路上所有节点的根节点。

社区聚类(基础应用)

When register on a social network, you are always asked to specify your hobbies in order to find some potential friends with the same hobbies. A social cluster is a set of people who have some of their hobbies in common. You are supposed to find all the clusters.

Each input file contains one test case. For each test case, the first line contains a positive integer $N (≤1000)$, the total number of people in a social network. Hence the people are numbered from $1$ to $N$. Then $N$ lines follow, each gives the hobby list of a person in the format:$K_i:h_i[1],h_i[2]…h_i[K_i]$, where $K_i (>0)$ is the number of hobbies, and $h_i[j]$ is the index of the $j$-th hobby, which is an integer in $[1, 1000]$.

For each case, print in one line the total number of clusters in the network. Then in the second line, print the numbers of people in the clusters in non-increasing order. The numbers must be separated by exactly one space, and there must be no extra space at the end of the line.

Sample Output:

1
2

3
4 3 1

题目意思很简单，每个人都有一些爱好，如果他们的爱好有相同的，就将这些人合并在一起。最后统计有几个社区，每个社区的人数是多少。直接来看代码就好，我会在一些容易迷惑的地方加上注释。

#include <cstdio>
#include <vector>
#include <algorithm>
using namespace std;
vector<int> father, isRoot;
int cmp1(int a, int b){return a > b;}

// 查找父节点，并更改相关节点的父节点
int findFather(int x) {
    int a = x;
    while(x != father[x])
        x = father[x];
    // 找到根节点后，把该集合内的节点的根节点都改为 x  
    while(a != father[a]) {
        int z = a;
        a = father[a];
        father[z] = x;
    }
    return x;
}

// 更改父节点
void Union(int a, int b) {
    int faA = findFather(a);
    int faB = findFather(b);
    if(faA != faB) father[faA] = faB;
}

int main() {
    int n, k, t, cnt = 0;
    int course[1001] = {0};
    scanf("%d", &n);
    father.resize(n + 1);
    isRoot.resize(n + 1);
    // 初始化，每个人的父节点都是自己
    for(int i = 1; i <= n; i++)
        father[i] = i;
    for(int i = 1; i <= n; i++) {
        scanf("%d:", &k);
        for(int j = 0; j < k; j++) {
            scanf("%d", &t);
            // 如果当前爱好没有被分配，那么当前爱好就属于当前的社区
            // course[4] = 2，那么之后所有的 course[4] 都是 2，是合并的前提
            if(course[t] == 0)
                course[t] = i;
            // 在输入样例中，第二个人和第四个人的爱好相同
            // 第二个人时 course[4] = 2，第四个人时 course[4] = 2
            // 那么合并 Union(4, 2)，得到的结果是 father[4] = 2
            // 如此，第四个人和第二个人就属于一类了
            Union(i, findFather(course[t]));
        }
    }
    // 因为要求出每个社区的人数，所以寻找每个节点的父亲，
    // 并使父节点编号对应的数值++
    for(int i = 1; i <= n; i++)
        isRoot[findFather(i)]++;
    // 那么，不为 0 的数量就是社团的数量
    for(int i = 1; i <= n; i++) {
        if(isRoot[i] != 0) cnt++;
    }
    printf("%d\n", cnt);
    // 且，若数值表示该社区内有多少人
    sort(isRoot.begin(), isRoot.end(), cmp1);
    for(int i = 0; i < cnt; i++) {
        printf("%d", isRoot[i]);
        if(i != cnt - 1) printf(" ");
    }
    return 0;
}

复杂一些的题目

This time, you are supposed to help us collect the data for family-owned property. Given each person’s family members, and the estate（房产）info under his/her own name, we need to know the size of each family, and the average area and number of sets of their real estate.

Each input file contains one test case. For each case, the first line gives a positive integer $N(≤1000)$. Then $N$ lines follow, each gives the infomation of a person who owns estate in the format:

ID Father Mother $k$ $Child_1$ $Child_2$ … $Child_k$ $M_\text{estate}$ Area

where ID is a unique 4-digit identification number for each person; Father and Mother are the ID’s of this person’s parents (if a parent has passed away, -1 will be given instead); $k(0≤k≤5)$ is the number of children of this person; $Child_i$’s are the ID‘s of his/her children; $M_\text{estate}$ is the total number of sets of the real estate under his/her name; and Area is the total area of his/her estate.

For each case, first print in a line the number of families (all the people that are related directly or indirectly are considered in the same family). Then output the family info in the format:

ID M $\text{AVG}_\text{sets}$ $\text{AVG}_\text{area}$

where ID is the smallest ID in the family; M is the total number of family members; $\text{AVG}_\text{sets}$ is the average number of sets of their real estate; and $\text{AVG}_\text{area}$ is the average area. The average numbers must be accurate up to 3 decimal places. The families must be given in descending order of their average areas, and in ascending order of the ID’s if there is a tie.

10
6666 5551 5552 1 7777 1 100
1234 5678 9012 1 0002 2 300
8888 -1 -1 0 1 1000
2468 0001 0004 1 2222 1 500
7777 6666 -1 0 2 300
3721 -1 -1 1 2333 2 150
9012 -1 -1 3 1236 1235 1234 1 100
1235 5678 9012 0 1 50
2222 1236 2468 2 6661 6662 1 300
2333 -1 3721 3 6661 6662 6663 1 100

3
8888 1 1.000 1000.000
0001 15 0.600 100.000
5551 4 0.750 100.000

道理还是一样的，就是数据操作上复杂了一点，代码如下：

#include <cstdio>
#include <algorithm>

using namespace std;

struct DATA {
    int id, fid, mid, num, area;
    int cid[10];
}data[1005];

struct node {
    int id, people;
    double num, area;
    bool flag = false;
}ans[10000];

int father[10000];
bool visit[10000];

int find(int x) {
	int a = x;
    while(x != father[x])
        x = father[x];
    while(a != father[a]){
    	int z = a;
    	a = father[a];
    	father[z] = x;
	}
    return x;
}

// 合并规则是：输出最小的ID 
void Union(int a, int b) {
    int faA = find(a);
    int faB = find(b);
    if(faA > faB)
        father[faA] = faB;
    else if(faA < faB)
        father[faB] = faA;
}

int cmp1(node a, node b) {
    if(a.area != b.area)
        return a.area > b.area;
    else
        return a.id < b.id;
}

int main() {
    int n, k, cnt = 0;
    scanf("%d", &n);
    for(int i = 0; i < 10000; i++)
        father[i] = i;
    for(int i = 0; i < n; i++) {
        scanf("%d %d %d %d", &data[i].id, &data[i].fid, &data[i].mid, &k);
        visit[data[i].id] = true;
        // 合并双亲的 id
        if(data[i].fid != -1) {
            visit[data[i].fid] = true;
            Union(data[i].fid, data[i].id);
        }
        if(data[i].mid != -1) {
            visit[data[i].mid] = true;
            Union(data[i].mid, data[i].id);
        }
        // 合并孩子的 id
        for(int j = 0; j < k; j++) {
            scanf("%d", &data[i].cid[j]);
            visit[data[i].cid[j]] = true;
            Union(data[i].cid[j], data[i].id);
        }
        scanf("%d %d", &data[i].num, &data[i].area);
    }
    // 每个根节点代表的家庭
    for(int i = 0; i < n; i++) {
        int id = find(data[i].id);
        // 这个家的 id、人数、区域、flag用于计数一共有几个家庭
        ans[id].id = id;
        ans[id].num += data[i].num;
        ans[id].area += data[i].area;
        ans[id].flag = true;
    }
    for(int i = 0; i < 10000; i++) {
        // 计算每个社区的人数
        if(visit[i])
            ans[find(i)].people++;
        // 计算有几个社区
        if(ans[i].flag)
            cnt++;
    }
    for(int i = 0; i < 10000; i++) {
        if(ans[i].flag) {
            ans[i].num = (double)(ans[i].num * 1.0 / ans[i].people);
            ans[i].area = (double)(ans[i].area * 1.0 / ans[i].people);
        }
    }
    sort(ans, ans + 10000, cmp1);
    printf("%d\n", cnt);
    for(int i = 0; i < cnt; i++)
        printf("%04d %d %.3f %.3f\n", ans[i].id, ans[i].people, ans[i].num, ans[i].area);
    return 0;
}

990. 等式方程的可满足性

这个是我目前见过的，最好的并查集的应用。

给定一个由表示变量之间关系的字符串方程组成的数组，每个字符串方程 equations[i] 的长度为 4，并采用两种不同的形式之一："a==b" 或 "a!=b"。在这里，a 和 b 是小写字母（不一定不同），表示单字母变量名。

只有当可以将整数分配给变量名，以便满足所有给定的方程时才返回 true，否则返回 false。示例：

1
2
3

输入：["a==b","b!=a"]
输出：false
解释：如果我们指定，a = 1 且 b = 1，那么可以满足第一个方程，但无法满足第二个方程。没有办法分配变量同时满足这两个方程。

我们把所有的等式用并查集这种结构归为一个组织，有同一个父亲
之后，针对所有的不等式，判断不等式两边的父亲是否相同，如果相同，则一定为 false

class Solution {
public:
    vector<int> root;
    vector<int> eq;
    vector<int> neq;
    int find(int x) {
        while (x != root[x]) {
            root[x] = root[root[x]];
            x = root[x];
        }
        return x;
    }
    void _union(int c1, int c2) {
        auto f1 = find(c1);
        auto f2 = find(c2);
        if (f1 != f2) {
            root[f1] = f2;
        }
    }
    bool equationsPossible(vector<string>& equations) {
        root.resize(26);
        for (int i = 0; i < 26; i++) {
            root[i] = i;
        }
        for (int i = 0; i < equations.size(); i++) {
            auto equal = equations[i][1];
            if (equal == '=') {
                eq.push_back(i);
            }
            if (equal == '!') {
                neq.push_back(i);
            }
        }
        for (auto i : eq) {
            char a = equations[i][0];
            char b = equations[i][3];
            _union(a - 'a', b - 'a');
        }
        for (auto i : neq) {
            char a = equations[i][0];
            char b = equations[i][3];
            if (find(a - 'a') == find(b - 'a'))
                return false;
        }
        return true;
    }
};

Just for Life.

算法系列：并查集

社区聚类(基础应用)

Sample Output:

复杂一些的题目

990. 等式方程的可满足性