Abstract | The traditional clustering algorithm, K-means, is famous for its simplicity and low time complexity. However, the usability of K-means is limited by its shortcoming that the clustering result is heavily dependent on the user-defined variants, i.e., the selection of the initial centroid seeds and the number of clusters (k). A new clustering algorithm, called K-means+, is proposed to extend K-means. The K-means+ algorithm can automatically determine a semi-optimal number of clusters according to the statistical nature of data; moreover, the initial centroid seeds are not critical to the clustering results. The experiment results on the Iris and the KDD-99 data illustrate the robustness of the K-means+ clustering algorithm, especially for a large amount of data in a high-dimensional space. |
---|