Evolutionary Principal Direction Divisive Partitioning

While data clustering has a long history and a large amount of research has been devoted to the development of clustering algorithms, significant challenges still remain. One of the most important challenges in the field is dealing with high dimensional datasets. The class of clustering algorithms that utilises information from Principal Component Analysis has proven very successful in such datasets. Unlike previous approaches employing principal components, in this paper we propose a technique that uses a quality criterion to select the most important dimension (projection). This criterion permits us to formulate the problem as an optimisation task over the space of projections. However, in high dimensional spaces this problem is hard to solve and analytic solutions are not available. Thus, we tackle this problem through the use of an evolutionary algorithm. The experimental results indicate that the proposed techniques are effective in both simulated and real data scenarios.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Vipin Kumar,et al.  The Challenges of Clustering High Dimensional Data , 2004 .

[3]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[4]  R. Storn,et al.  Differential Evolution - A simple and efficient adaptive scheme for global optimization over continuous spaces , 2004 .

[5]  Hitoshi Isahara,et al.  Refining A Divisive Partitioning Algorithm for Unsupervised Clustering , 2003, HIS.

[6]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[7]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[8]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[9]  J. Kogan Introduction to Clustering Large and High-Dimensional Data , 2007 .

[10]  Charles Nicholas,et al.  Feature Selection and Document Clustering , 2004 .

[11]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[12]  Leslie Greengard,et al.  The Fast Gauss Transform , 1991, SIAM J. Sci. Comput..

[13]  Dimitris K. Tasoulis,et al.  Projection Based Clustering of Gene Expression Data , 2009, CIBB.

[14]  Rainer Storn,et al.  System design by constraint adaptation and differential evolution , 1999, IEEE Trans. Evol. Comput..

[15]  Dimitris K. Tasoulis,et al.  Parallel differential evolution , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[16]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[17]  Vassilis P. Plagianakos,et al.  Parallel evolutionary training algorithms for “hardware-friendly” neural networks , 2002, Natural Computing.

[18]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[19]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[20]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[21]  Kenneth V. Price,et al.  An introduction to differential evolution , 1999 .

[22]  George D. Magoulas,et al.  Evolutionary training of hardware realizable multilayer perceptrons , 2006, Neural Computing & Applications.

[23]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[24]  Efstratios Gallopoulos,et al.  Principal Direction Divisive Partitioning with Kernels and k-Means Steering , 2008 .

[25]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[26]  Martin Nilsson,et al.  Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning , 2002, Information Retrieval.

[27]  Larry S. Davis,et al.  Improved fast gauss transform and efficient kernel density estimation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[29]  D. K. Tasoulis,et al.  Improving Principal Direction Divisive Clustering , 2008 .