Overview on techniques in cluster analysis.

Clustering is the unsupervised, semisupervised, and supervised classification of patterns into groups. The clustering problem has been addressed in many contexts and disciplines. Cluster analysis encompasses different methods and algorithms for grouping objects of similar kinds into respective categories. In this chapter, we describe a number of methods and algorithms for cluster analysis in a stepwise framework. The steps of a typical clustering analysis process include sequentially pattern representation, the choice of the similarity measure, the choice of the clustering algorithm, the assessment of the output, and the representation of the clusters.

[1]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[2]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[5]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[6]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[7]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[8]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[9]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[10]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[11]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[12]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[13]  Joseph Felsenstein,et al.  Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull , 1993 .

[14]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[15]  A. Zharkikh,et al.  Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. , 1992, Molecular biology and evolution.

[16]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[17]  ROSA BLANCO,et al.  Gene Selection For Cancer Classification Using Wrapper Approaches , 2004, Int. J. Pattern Recognit. Artif. Intell..

[18]  Hidetoshi Shimodaira An approximately unbiased test of phylogenetic tree selection. , 2002, Systematic biology.

[19]  Edwin Diday,et al.  A Recent Advance in Data Analysis: Clustering Objects into Classes Characterized by Conjunctive Concepts , 1981 .

[20]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[21]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[22]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[23]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[24]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[25]  Hans-Paul Schwefel,et al.  Numerical Optimization of Computer Models , 1982 .

[26]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  J. Dopazo,et al.  Phylogenetic Reconstruction Using an Unsupervised Growing Neural Network That Adopts the Topology of a Phylogenetic Tree , 1997, Journal of Molecular Evolution.

[29]  King-Sun Fu,et al.  A Sentence-to-Sentence Clustering Procedure for Pattern Analysis , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[30]  Ian Witten,et al.  Data Mining , 2000 .

[31]  James C. Bezdek,et al.  Clustering with a genetically optimized approach , 1999, IEEE Trans. Evol. Comput..

[32]  C. Spearman ‘FOOTRULE’ FOR MEASURING CORRELATION , 1906 .

[33]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[34]  G H Ball,et al.  A clustering technique for summarizing multivariate data. , 1967, Behavioral science.

[35]  G. Arfken Mathematical Methods for Physicists , 1967 .

[36]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[37]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[38]  Hidetoshi Shimodaira,et al.  Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling , 2004, math/0508602.

[39]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[40]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[41]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[43]  C. Subbarao,et al.  Characterization of groundwater contamination using factor analysis , 1996 .

[44]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[45]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[46]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[47]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[48]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[49]  B. S. Duran,et al.  Cluster Analysis: A Survey , 1974 .

[50]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[51]  M J Sanderson,et al.  Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae). , 2000, Systematic biology.