Multiobjective data clustering

Conventional clustering algorithms utilize a single criterion that may not conform to the diverse shapes of the underlying clusters. We offer a new clustering approach that uses multiple clustering objective functions simultaneously. The proposed multiobjective clustering is a two-step process. It includes detection of clusters by a set of candidate objective functions as well as their integration into the target partition. A key ingredient of the approach is a cluster goodness junction that evaluates the utility of multiple clusters using re-sampling techniques. Multiobjective data clustering is obtained as a solution to a discrete optimization problem in the space of clusters. At meta-level, our algorithm incorporates conflict resolution techniques along with the natural data constraints. An empirical study on a number of artificial and real-world data sets demonstrates that multiobjective data clustering leads to valid and robust data partitions.

[1]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[2]  Yuntao Qian,et al.  Clustering combination method , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[3]  Joachim M. Buhmann,et al.  Stability-Based Model Selection , 2002, NIPS.

[4]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[5]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[6]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[8]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[9]  Joachim M. Buhmann,et al.  Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Hillol Kargupta,et al.  Collective, Hierarchical Clustering from Distributed, Heterogeneous Data , 1999, Large-Scale Parallel Data Mining.

[11]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[12]  Anil K. Jain,et al.  Bootstrap technique in cluster analysis , 1987, Pattern Recognit..

[13]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.