Abstract:
Motivated by the interests of social network analysis and network-based recommendation systems, we consider a semi-supervised community detection problem, where the goal is to estimate the community label of a new node by leveraging on the network structure and partially observed community labels of existing nodes. We model the network with a degree-corrected stochastic block model, which allows for severe degree heterogeneity and potentially non-assortative communities. We propose a fast algorithm that computes a `structural similarity metric' between the new node and each of the $K$ communities, aggregating information in labeled and unlabeled data. The estimated label of the new node is equal to the value of $k$ that maximizes this similarity metric. Our method is computationally fast and compares favorably with existing semi-supervised algorithms on numerical performance. In theory, we derive explicit bounds for the misclassification error and show the efficiency of our method by comparing it with an ideal classifier. To our best knowledge, our results provide the first semi-supervised community detection algorithm with theoretical guarantees.