Skip to main content
SHARE
Publication

Designing a parallel Feel-the-Way clustering algorithm on HPC systems...

by Weijian Zheng, Dali Wang, Fengguang Song
Publication Type
Journal
Journal Name
The International Journal of High Performance Computing Applications
Publication Date
Page Numbers
154 to 169
Volume
35
Issue
2

This paper introduces a new parallel clustering algorithm, named Feel-the-Way clustering algorithm, that provides better or equivalent convergence rate than the traditional clustering methods by optimizing the synchronization and communication costs. Our algorithm design centers on how to optimize three factors simultaneously: reduced synchronizations, improved convergence rate, and retained same or comparable optimization cost. To compare the optimization cost, we use the Sum of Square Error (SSE) cost as the metric, which is the sum of the square distance between each data point and its assigned clusters. Compared with the traditional MPI k-means algorithm, the new Feel-the-Way algorithm requires less communications among participating processes. As for the convergence rate, the new algorithm requires fewer number of iterations to converge. As for the optimization cost, it obtains the SSE costs that are close to the k-means algorithm. In the paper, we first design the full-step Feel-the-Way k-means clustering algorithm that can significantly reduce the number of iterations that are required by the original k-means clustering method. Next, we improve the performance of the full-step algorithm by adopting an optimized sampling-based approach, named reassignment-history-aware sampling. Our experimental results show that the optimized sampling-based Feel-the-Way method is significantly faster than the widely used k-means clustering method, and can provide comparable optimization costs. More extensive experiments with several synthetic datasets and real-world datasets (e.g., MNIST, CIFAR-10, ENRON, and PLACES-2) show that the new parallel algorithm can outperform the open source MPI k-means library by up to 110% on a high-performance computing system using 4,096 CPU cores. In addition, the new algorithm can take up to 51% fewer iterations to converge than the k-means clustering algorithm.