[Japanese|English]
K-Means is a well-known clustering method for multidimensional data. However, since it assigns each data point to the nearest representative point, it is difficult to extract clusters with complex shapes. To address this issue, K-Multiple-Means was proposed. This approach can effectively extract complex-shaped clusters by grouping multiple sub-clusters. However, it suffers from the problem of high computational cost for clustering. We have developed an approach that enables fast clustering with K-Multiple-Means, even for large-scale datasets.

K-Multiple-Means performs sub-cluster grouping by repeatedly computing distances between cluster centers and data points and then partitioning a bipartite graph. However, since this requires computing SVD of the matrix representing the bipartite graph, the computational cost becomes high.
We accelerates clustering by exploiting the connected components of the graph between the centers to construct a small block matrix, and then efficiently computing the SVD from the matrix.

Our approach can perform clustering up to more than 230 times faster than the original K-Multiple-Means. For example, while the original method required more than 10 days to complete clustering on a dataset, our approach can accomplish it in about one hour.

Our approach accelerates K-Multiple-Means without sacrificing the quality of clustering results. K-Multiple-Means can be applied to a variety of tasks, including blind source separation, image segmentation, and network load balancing. Our approach is expected to serve as a breakthrough that enables these applications to be executed efficiently even on large-scale datasets.
Yasuhiro Fujiwara
Recognition Research Group, Media Information Laboratory, NTT Communication Science Laboratories