Skip to yearly menu bar Skip to main content


Virtual presentation / poster accept

Statistical Guarantees for Consensus Clustering

Zhixin Zhou · Gautam Dudeja · Arash Amini

Keywords: [ semidefinite relaxation ] [ consensus clustering ] [ unsupervised label aggregation ] [ Frechet mean ] [ barycenter problem ] [ spectral clustering ] [ Theory ]


Abstract: Consider the problem of clustering $n$ objects. One can apply multiple algorithms to produce $N$ potentially different clustersings of the same objects, that is, partitions of the $n$ objects into $K$ groups. Even a single randomized algorithm can output different clusterings. This often happens when one samples from the posterior of a Bayesian model, or runs multiple MCMC chains from random initializations. A natural task is then to form a consensus among these different clusterings. The challenge in an unsupervised setting is that the optimal matching between clusters of different inputs is unknown. We model this problem as finding a barycenter (also known as Fr\'{e}chet mean) relative to the misclassification rate. We show that by lifting the problem to the space of association matrices, one can derive aggregation algorithms that circumvent the knowledge of the optimal matchings. We analyze the statistical performance of aggregation algorithms under a stochastic label perturbation model, and show that a $K$-means type algorithm followed by a local refinement step can achieve near optimal performance, with a rate that decays exponentially fast in $N$. Numerical experiments show the effectiveness of the proposed methods.

Chat is not available.