A clustering approach to interpretable principal components

A new method for constructing interpretable principal components is proposed. The method first clusters the variables, and then interpretable (sparse) components are constructed from the correlation matrices of the clustered variables. For the first step of the method, a new weighted-variances method for clustering variables is proposed. It reflects the nature of the problem that the interpretable components should maximize the explained variance and thus provide sparse dimension reduction. An important feature of the new clustering procedure is that the optimal number of clusters (and components) can be determined in a non-subjective manner. The new method is illustrated using well-known simulated and real data sets. It clearly outperforms many existing methods for sparse principal component analysis in terms of both explained variance and sparseness.

[1]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[2]  I. Jolliffe Discarding Variables in a Principal Component Analysis. Ii: Real Data , 1973 .

[3]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[4]  H. Chipman,et al.  Interpretable dimension reduction , 2005 .

[5]  Maurizio Vichi,et al.  Clustering and disjoint principal component analysis , 2009, Comput. Stat. Data Anal..

[6]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[7]  Shai Avidan,et al.  Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms , 2005, NIPS.

[8]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[9]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .

[10]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[11]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[12]  Ian T. Jolliffe,et al.  Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[13]  Ian T. Jolliffe,et al.  Variable selection and the interpretation of principal subspaces , 2001 .

[14]  Theo Gasser,et al.  Simple component analysis , 2004 .

[15]  E. Vigneau,et al.  Clustering of Variables Around Latent Components , 2003 .

[16]  Jorge Cadima Departamento de Matematica Loading and correlations in the interpretation of principle compenents , 1995 .

[17]  J. N. R. Jeffers,et al.  Two Case Studies in the Application of Principal Component Analysis , 1967 .