Combining principles with pragmatism, a new approach and accompanying algorithm are presented to a longstanding problem in applied statistics: the interpretation of principal components. Following Rousson and Gasser [53 (2004) 539–555]
'the ultimate goal is not to propose a method that leads automatically to a unique solution, but rather to develop tools for assisting the user in his or her choice of an interpretable solution'.
Accordingly, our approach is essentially exploratory. Calling a vector ‘simple’ if it has small integer elements, it poses the open question: 'What sets of simply interpretable orthogonal axes—if any—are angle close'
to the principal components of interest? its answer being presented in summary form as an automated visual display of the solutions found, ordered in terms of overall measures of simplicity, accuracy and star quality, from which the user may choose. Here, ‘star quality’ refers to striking overall patterns in the sets of axes found, deserving to be especially drawn to the user’s attention precisely because they have emerged from the data, rather than being imposed on it by (implicitly) adopting a model. Indeed, other things being equal, explicit models can be checked by seeing if their fits occur in our exploratory analysis, as we illustrate. Requiring orthogonality, attractive visualization and dimension reduction features of principal component analysis are retained.
Exact implementation of this principled approach is shown to provide an exhaustive set of solutions, but is combinatorially hard. Pragmatically, we provide an efficient, approximate algorithm. Throughout, worked examples show how this new tool adds to the applied statistician’s armoury, effectively combining simplicity, retention of optimality and computational efficiency, while complementing existing methods. Examples are also given where simple structure in the population principal components is recovered using only information from the sample. Further developments are briefly indicated.
[1]
Michael I. Jordan,et al.
A Direct Formulation for Sparse Pca Using Semidefinite Programming
,
2004,
NIPS 2004.
[2]
Alessio Farcomeni,et al.
An exact approach to sparse principal component analysis
,
2009,
Comput. Stat..
[3]
Tamara G. Kolda,et al.
A semidiscrete matrix decomposition for latent semantic indexing information retrieval
,
1998,
TOIS.
[4]
I. Jolliffe,et al.
A Modified Principal Component Technique Based on the LASSO
,
2003
.
[5]
I. Jolliffe.
Principal Component Analysis
,
2002
.
[6]
S. Vines.
Simple principal components
,
2000
.
[7]
R. Tibshirani,et al.
Sparse Principal Component Analysis
,
2006
.
[8]
Rasmus Larsen,et al.
Sparse principal component analysis in medical shape modeling
,
2006,
SPIE Medical Imaging.
[9]
H. Sebastian Seung,et al.
Learning the parts of objects by non-negative matrix factorization
,
1999,
Nature.
[10]
Runze Li,et al.
Some methods for generating both an NT-net and the uniform distribution on a Stiefel manifold and their applications
,
1997
.
[11]
Linjuan Sun,et al.
Simple principal components.
,
2006
.
[12]
Daniel Gervini,et al.
Criteria for Evaluating Dimension-Reducing Components for Multivariate Data
,
2004
.
[13]
H. Chipman,et al.
Interpretable dimension reduction
,
2005
.
[14]
Ian T. Jolliffe,et al.
DALASS: Variable selection in discriminant analysis via the LASSO
,
2007,
Comput. Stat. Data Anal..
[15]
L. Lazzeroni.
Plaid models for gene expression data
,
2000
.
[16]
J. N. R. Jeffers,et al.
Two Case Studies in the Application of Principal Component Analysis
,
1967
.
[17]
Theo Gasser,et al.
Simple component analysis
,
2004
.
[18]
Trevor Park.
A Penalized Likelihood Approach to Rotation of Principal Components
,
2005
.
[19]
J. Blum,et al.
Ultrasound in Obstetrics and Gynecology. 2d ed
,
1985
.