An Improved and More Scalable Evolutionary Approach to Multiobjective Clustering

The multiobjective realization of the data clustering problem has shown great promise in recent years, yielding clear conceptual advantages over the more conventional, single-objective approach. Evolutionary algorithms have largely contributed to the development of this increasingly active research area on multiobjective clustering. Nevertheless, the unprecedented volumes of data seen widely today pose significant challenges and highlight the need for more effective and scalable tools for exploratory data analysis. This paper proposes an improved version of the multiobjective clustering with automatic ${k}$ -determination algorithm. Our new algorithm improves its predecessor in several respects, but the key changes are related to the use of an efficient, specialized initialization routine and two alternative reduced-length representations. These design components exploit information from the minimum spanning tree and redefine the problem in terms of the most relevant subset of its edges. This paper reveals that both the new initialization routine and the new solution representations not only contribute to decrease the computational overhead, but also entail a significant reduction of the search space, enhancing therefore the convergence capabilities and overall effectiveness of the method. These results suggest that the new algorithm proposed here will offer significant advantages in the realm of “big data” analytics and applications.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Alvaro Garcia-Piquer,et al.  Scaling-up multiobjective evolutionary clustering algorithms using stratification , 2017, Pattern Recognit. Lett..

[3]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[5]  Ujjwal Maulik,et al.  A Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part I , 2014, IEEE Transactions on Evolutionary Computation.

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[7]  Ujjwal Maulik,et al.  Combining Pareto-Optimal Clusters Using Supervised Learning , 2011 .

[8]  Rajesh Kumar,et al.  A review on particle swarm optimization algorithms and their applications to data clustering , 2011, Artificial Intelligence Review.

[9]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[10]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[11]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  Joshua D. Knowles,et al.  A New Reduced-Length Genetic Representation for Evolutionary Multiobjective Clustering , 2017, EMO.

[14]  Zhen Ma,et al.  A review of algorithms for medical image segmentation and their applications to the female pelvic cavity , 2010, Computer methods in biomechanics and biomedical engineering.

[15]  Ujjwal Maulik,et al.  A multiobjective approach to MR brain image segmentation , 2011, Appl. Soft Comput..

[16]  Joshua D. Knowles,et al.  Multiobjective clustering around medoids , 2005, 2005 IEEE Congress on Evolutionary Computation.

[17]  Ujjwal Maulik,et al.  Unsupervised Pixel Classification in Satellite Imagery Using Multiobjective Fuzzy Clustering Combined With SVM Classifier , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[19]  Joshua D. Knowles,et al.  Exploiting the Trade-off - The Benefits of Multiple Objectives in Data Clustering , 2005, EMO.

[20]  Ujjwal Maulik,et al.  A Survey of Multiobjective Evolutionary Clustering , 2015, ACM Comput. Surv..

[21]  Stan Matwin,et al.  A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data , 2013, Artificial Intelligence Review.

[22]  Ujjwal Maulik,et al.  Analysis of microarray data using multiobjective variable string length genetic fuzzy clustering , 2009, 2009 IEEE Congress on Evolutionary Computation.

[23]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  Duoqian Miao,et al.  A graph-theoretical clustering method based on two rounds of minimum spanning trees , 2010, Pattern Recognit..

[25]  Emanuel Falkenauer,et al.  A hybrid grouping genetic algorithm for bin packing , 1996, J. Heuristics.

[26]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[27]  Ujjwal Maulik,et al.  Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part II , 2014, IEEE Transactions on Evolutionary Computation.

[28]  Franz Rothlauf,et al.  Redundant Representations in Evolutionary Computation , 2003, Evolutionary Computation.

[29]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[30]  Joshua D. Knowles,et al.  An Investigation of Representations and Operators for Evolutionary Data Clustering with a Variable Number of Clusters , 2006, PPSN.

[31]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[32]  Alexandre X. Falcão,et al.  Motion segmentation and activity representation in crowds , 2009 .

[33]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[34]  Joshua D. Knowles,et al.  Evolutionary Multiobjective Clustering , 2004, PPSN.

[35]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[36]  Yan Zhou,et al.  Clustering with Minimum Spanning Trees , 2011, Int. J. Artif. Intell. Tools.

[37]  Joshua D. Knowles,et al.  Improvements to the scalability of multiobjective clustering , 2005, 2005 IEEE Congress on Evolutionary Computation.

[38]  Gene H. Golub,et al.  Algorithms for Computing the Sample Variance: Analysis and Recommendations , 1983 .

[39]  Arantza Casillas,et al.  Document Clustering into an Unknown Number of Clusters Using a Genetic Algorithm , 2003, TSD.

[40]  Günther R. Raidl,et al.  Empirical Analysis of Locality, Heritability and Heuristic Bias in Evolutionary Algorithms: A Case Study for the Multidimensional Knapsack Problem , 2005, Evolutionary Computation.

[41]  A. Sungoor,et al.  Comparative Analysis of Genomic Signal Processing for Microarray Data Clustering , 2011, IEEE Transactions on NanoBioscience.

[42]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  A. Ferligoj,et al.  Direct multicriteria clustering algorithms , 1992 .

[44]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[45]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[46]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[47]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[48]  Pierre Hansen,et al.  Bicriterion Cluster Analysis , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[50]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[51]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[52]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[53]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithms for Clustering - Applications in Data Mining and Bioinformatics , 2011 .

[54]  Ganapati Panda,et al.  A survey on nature inspired metaheuristic algorithms for partitional clustering , 2014, Swarm Evol. Comput..

[55]  Carlos M. Fonseca,et al.  Inferential Performance Assessment of Stochastic Optimisers and the Attainment Function , 2001, EMO.

[56]  Jih-Jeng Huang,et al.  Marketing segmentation using support vector clustering , 2007, Expert Syst. Appl..

[57]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[58]  Konstantinos Tsiptsis,et al.  Data Mining Techniques in CRM: Inside Customer Segmentation , 2010 .

[59]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[60]  Thomas Stützle,et al.  Exploratory Analysis of Stochastic Local Search Algorithms in Biobjective Optimization , 2010, Experimental Methods for the Analysis of Optimization Algorithms.