A common factor-analytic model for classification

In this era of data explosion, much research has been directed to the problem of filtering and extracting useful information from extremely large datasets. The focus is on discriminant analysis of high-dimensional data, where the number of dimensions p is very large relative to the number of observations n. Mixture discriminant analysis provides an effective parametric approach, where each class density is modeled using mixtures of common factor analyzers. Although the adoption of mixture models with common factor loadings in the components significantly reduces the number of parameters to be estimated, the number of variables has to be reduced first to a more manageable level. Thus we consider the problem of dimension reduction for high-dimensional data. In this paper, we propose a factor-analytic model with common factor loadings for classification. We apply our model to a breast cancer study involving microarray gene expression data, which shows the parametric approach can select informative genes that improve the prediction of disease outcome.

[1]  Christophe Ambroise,et al.  Selection bias in working with the top genes in supervised classification of tissue samples , 2006 .

[2]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[3]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[4]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[5]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  W. Marsden I and J , 2012 .

[7]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[8]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[9]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[10]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  G. J. McLachlan,et al.  Correcting for selection bias via cross-validation in the classification of microarray data , 2008, 0805.2501.

[12]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[13]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[14]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[15]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[16]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[17]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers , 2000, International Conference on Machine Learning.