Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the original space. By constraining model parameters within and between groups, a family of 12 parsimonious DLM models is exhibited which allows to fit onto various situations. An estimation algorithm, called the Fisher-EM algorithm, is also proposed for estimating both the mixture parameters and the discriminative subspace. Experiments on simulated and real datasets highlight the good performance of the proposed approach as compared to existing clustering methods while providing a useful representation of the clustered data. The method is as well applied to the clustering of mass spectrometry data.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[3]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[4]  Nizar Bouguila,et al.  A Hybrid Feature Extraction Selection Approach for High-Dimensional Non-Gaussian Data Clustering , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[6]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[7]  J. Friedman Regularized Discriminant Analysis , 1989 .

[8]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[9]  Jing-Yu Yang,et al.  A generalized Foley-Sammon transform based on generalized fisher discriminant criterion and its application to face recognition , 2003, Pattern Recognit. Lett..

[10]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .

[11]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[13]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[14]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[15]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[16]  A. Raftery,et al.  Variable Selection for Model-Based Clustering , 2006 .

[17]  Cordelia Schmid,et al.  High-dimensional data clustering , 2006, Comput. Stat. Data Anal..

[18]  Haesun Park,et al.  Generalizing discriminant analysis using the generalized singular value decomposition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Paul D. McNicholas,et al.  Parsimonious Gaussian mixture models , 2008, Stat. Comput..

[20]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[21]  G. Celeux,et al.  Variable Selection for Clustering with Gaussian Mixture Models , 2009, Biometrics.

[22]  GunopulosDimitrios,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998 .

[23]  Jieping Ye,et al.  Discriminative K-means for Clustering , 2007, NIPS.

[24]  Jing-Yu Yang,et al.  A generalized optimal set of discriminant vectors , 1992, Pattern Recognit..

[25]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[26]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[27]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[28]  P. Deb Finite Mixture Models , 2008 .

[29]  Ian T. Jolliffe,et al.  DALASS: Variable selection in discriminant analysis via the LASSO , 2007, Comput. Stat. Data Anal..

[30]  A. Montanari,et al.  Heteroscedastic factor mixture analysis , 2010 .

[31]  David A. Clausi,et al.  K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation , 2002, Pattern Recognit..

[32]  Gene H. Golub,et al.  Matrix computations , 1983 .

[33]  Yoshihiko Hamamoto,et al.  A note on the orthonormal discriminant vector method for feature extraction , 1991, Pattern Recognit..

[34]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[36]  David G. Stork,et al.  Pattern Classification , 1973 .

[37]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[38]  Michael I. Jordan,et al.  Mixtures of Probabilistic Principal Component Analyzers , 2001 .

[39]  Bart J. A. Mertens,et al.  Biomarker discovery in MALDI-TOF serum protein profiles using discrete wavelet transformation , 2009, Bioinform..

[40]  John W. Sammon,et al.  An Optimal Set of Discriminant Vectors , 1975, IEEE Transactions on Computers.

[41]  H. Akaike A new look at the statistical model identification , 1974 .

[42]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[43]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[44]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[45]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[46]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[47]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[48]  Geoffrey J. McLachlan,et al.  Modelling high-dimensional data by mixtures of factor analyzers , 2003, Comput. Stat. Data Anal..

[49]  N. Campbell CANONICAL VARIATE ANALYSIS—A GENERAL MODEL FORMULATION , 1984 .

[50]  D. W. Scott,et al.  PROBABILITY DENSITY ESTIMATION IN HIGHER DIMENSIONS , 2014 .

[51]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Michel Verleysen,et al.  The Curse of Dimensionality in Data Mining and Time Series Prediction , 2005, IWANN.

[53]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[54]  Jing-Yu Yang,et al.  A theorem on the uncorrelated optimal discriminant vectors , 2001, Pattern Recognit..