Unsupervised Data Mining: Introduction

This chapter focuses on cluster analysis in the context of unsupervised data mining. Various facets of cluster analysis, including proximities, are discussed in detail. Techniques of determining the natural number of clusters are described. Finally, techniques of assessing cluster accuracy and reproducibility are detailed. Techniques mentioned in this chapter are expanded upon in the following chapters.

[1]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[2]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[3]  Vladimir Estivill-Castro,et al.  Cluster Validity Using Support Vector Machines , 2003, DaWaK.

[4]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[5]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  D. Coomans,et al.  The application of linear discriminant analysis in the diagnosis of thyroid diseases , 1978 .

[7]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[8]  W. Heiser,et al.  Clusteringn objects intok groups under optimal scaling of variables , 1989 .

[9]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[10]  Forrest W. Young Quantitative analysis of qualitative data , 1981 .

[11]  Kurt Varmuza,et al.  Clustering and similarity of chemical structures represented by binary substructure descriptors , 2003 .

[12]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[13]  Jack-Gérard Postaire,et al.  Cluster Analysis by Binary Morphology , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Robert Winter,et al.  Formal Validation of Schema Clustering for Large Information Systems , 1995 .

[15]  Robert Tibshirani,et al.  Cluster Validation by Prediction Strength , 2005 .

[16]  Danny Coomans,et al.  Auto-associative Multivariate Regression Trees for Cluster Analysis , 2006 .

[17]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[18]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[19]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[20]  Ickjai Lee,et al.  Cluster Validity Through Graph-based Boundary Analysis , 2004, IKE.

[21]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[22]  N. Boujemaa Generalized competitive clustering for image segmentation , 2000, PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.00TH8500).

[23]  Nikhil R. Pal,et al.  Cluster validation using graph theoretic concepts , 1997, Pattern Recognit..

[24]  R. Put,et al.  The use of CART and multivariate regression trees for supervised and unsupervised feature selection , 2005 .

[25]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[26]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[27]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[28]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[29]  Eytan Domany,et al.  Resampling Method for Unsupervised Estimation of Cluster Validity , 2001, Neural Computation.

[30]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[31]  Rainer Koschke,et al.  A framework for experimental evaluation of clustering techniques , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[32]  Elias Pampalk,et al.  EMPIRICAL EVALUATION OF CLUSTERING ALGORITHMS , 2000 .

[33]  Michalis Vazirgiannis,et al.  Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[34]  RICHARD C. DUBES,et al.  How many clusters are best? - An experiment , 1987, Pattern Recognit..

[35]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[37]  P. Duncombe,et al.  Multivariate Descriptive Statistical Analysis: Correspondence Analysis and Related Techniques for Large Matrices , 1985 .

[38]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[39]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[40]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[41]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[42]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[43]  Richard C. Dubes,et al.  Cluster Analysis and Related Issues , 1993, Handbook of Pattern Recognition and Computer Vision.

[44]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[45]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[46]  Pierre Baldi,et al.  ChemDB: a public database of small molecules and related chemoinformatics resources , 2005, Bioinform..

[47]  Bogdan E. Popescu,et al.  Gradient Directed Regularization for Linear Regression and Classi…cation , 2004 .

[48]  M. Hill,et al.  Nonlinear Multivariate Analysis. , 1990 .

[49]  José Carlos Príncipe,et al.  A new clustering evaluation function using Renyi's information potential , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[50]  L. Lazzeroni Plaid models for gene expression data , 2000 .