A Least-Squares Framework for Component Analysis

Over the last century, Component Analysis (CA) methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA), Locality Preserving Projections (LPP), and Spectral Clustering (SC) have been extensively used as a feature extraction step for modeling, classification, visualization, and clustering. CA techniques are appealing because many can be formulated as eigen-problems, offering great potential for learning linear and nonlinear representations of data in closed-form. However, the eigen-formulation often conceals important analytic and computational drawbacks of CA techniques, such as solving generalized eigen-problems with rank deficient matrices (e.g., small sample size problem), lacking intuitive interpretation of normalization factors, and understanding commonalities and differences between CA methods. This paper proposes a unified least-squares framework to formulate many CA methods. We show how PCA, LDA, CCA, LPP, SC, and its kernel and regularized extensions correspond to a particular instance of least-squares weighted kernel reduced rank regression (LS--WKRRR). The LS-WKRRR formulation of CA methods has several benefits: 1) provides a clean connection between many CA techniques and an intuitive framework to understand normalization factors; 2) yields efficient numerical schemes to solve CA techniques; 3) overcomes the small sample size problem; 4) provides a framework to easily extend CA methods. We derive weighted generalizations of PCA, LDA, SC, and CCA, and several new CA techniques.

[1]  Fernando De la Torre,et al.  Optimal feature selection for subspace image matching , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Geoffrey J. Gordon Generalized² Linear² Models , 2003, NIPS 2003.

[6]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[7]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[8]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[9]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[10]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[11]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[12]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[13]  Fernando De la Torre,et al.  Indoor people tracking based on dynamic weighted multidimensional scaling , 2007, MSWiM '07.

[14]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[15]  Timothy F. Cootes,et al.  Statistical models of appearance for computer vision , 1999 .

[16]  Erik G. Learned-Miller,et al.  Data driven image models through continuous joint alignment , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Andrew W. Fitzgibbon,et al.  Damped Newton algorithms for matrix factorization with missing data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[19]  Sun-Yuan Kung,et al.  Principal Component Neural Networks: Theory and Applications , 1996 .

[20]  Wenyi Zhao,et al.  Discriminant component analysis for face recognition , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[21]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[22]  Robert P. W. Duin,et al.  Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Matthew E. Brand Subspace mappings for image sequences , 2002 .

[24]  Horst Bischof,et al.  Illumination insensitive recognition using eigenspaces , 2004, Comput. Vis. Image Underst..

[25]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[26]  Juha Karhunen,et al.  Principal component neural networks — Theory and applications , 1998, Pattern Analysis and Applications.

[27]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[28]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[29]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[30]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[32]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[33]  David G. Stork,et al.  Pattern Classification , 1973 .

[34]  Ivor W. Tsang,et al.  Distance metric learning with kernels , 2003 .

[35]  Dianne P. O'Leary,et al.  Digital Image Compression by Outer Product Expansion , 1983, IEEE Trans. Commun..

[36]  T. W. Anderson Estimating Linear Restrictions on Regression Coefficients for Multivariate Normal Distributions , 1951 .

[37]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[38]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[39]  Iasonas Kokkinos,et al.  Unsupervised Learning of Object Deformation Models , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[40]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Tomaso A. Poggio,et al.  Multidimensional morphable models , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[42]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[43]  E. Wilson,et al.  Numerical Methods in Finite Element , 1976 .

[44]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[45]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[46]  A Least-Squares Unified View of PCA , LDA , CCA and Spectral Graph Methods , 2008 .

[47]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[48]  Fernando De la Torre,et al.  Parameterized Kernel Principal Component Analysis: Theory and applications to supervised and unsupervised image alignment , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Danijel Skocaj,et al.  Weighted and robust incremental method for subspace learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[50]  Michael J. Black,et al.  A Framework for Robust Subspace Learning , 2003, International Journal of Computer Vision.

[51]  A. Rahimi,et al.  Clustering with Normalized Cuts is Clustering with a Hyperplane , 2004 .

[52]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[53]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[54]  P. Anandan,et al.  Factorization with Uncertainty , 2000, International Journal of Computer Vision.

[55]  Jean-Philippe Thiran,et al.  The BANCA Database and Evaluation Protocol , 2003, AVBPA.

[56]  Michael J. Black,et al.  Robust parameterized component analysis: theory and applications to 2D facial appearance models , 2003, Comput. Vis. Image Underst..

[57]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[58]  Zaïd Harchaoui,et al.  DIFFRAC: a discriminative and flexible framework for clustering , 2007, NIPS.

[59]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[60]  Robert T. Collins,et al.  Spectral rounding and image segmentation , 2006 .

[61]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[62]  Jingqi Yuan,et al.  Statistical monitoring of fed-batch process using dynamic multiway neighborhood preserving embedding , 2008 .

[63]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[64]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[65]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[66]  Takeo Kanade,et al.  Multimodal oriented discriminant analysis , 2005, ICML.

[67]  Lawrence K. Saul,et al.  A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[68]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[69]  P. GALLINARI,et al.  On the relations between discriminant analysis and multilayer perceptrons , 1991, Neural Networks.

[70]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  Jeff G. Schneider,et al.  Automatic construction of active appearance models as an image coding problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[73]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[74]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[75]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[76]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[77]  Timothy F. Cootes,et al.  Statistical models of appearance for medical image analysis and computer vision , 2001, SPIE Medical Imaging.

[78]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[79]  Michael J. Black,et al.  Eigentracking: Robust matching and tracking of objects using view - based representation , 1998 .

[80]  Horst Bischof,et al.  Appearance models based on kernel canonical correlation analysis , 2003, Pattern Recognit..

[81]  Jieping Ye,et al.  Discriminative K-means for Clustering , 2007, NIPS.

[82]  Sridha Sridharan,et al.  Least squares congealing for unsupervised alignment of images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[83]  Marian Stewart Bartlett,et al.  Face image analysis by unsupervised learning , 2001 .

[84]  Harry Shum,et al.  Principal Component Analysis with Missing Data and Its Application to Polyhedral Object Modeling , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[85]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[86]  Amnon Shashua,et al.  Linear image coding for regression and classification using the tensor-rank principle , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[87]  J. P. Lewis Fast Normalized Cross-Correlation , 2010 .

[88]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[89]  Geoffrey J. Gordon Generalized^2 Linear^2 Models , 2002, NIPS 2002.

[90]  Terence Sim,et al.  Discriminant Subspace Analysis: A Fukunaga-Koontz Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  Aleix M. Martínez,et al.  Bayes Optimality in Linear Discriminant Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[93]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[94]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[95]  F. De la Torre Automatic learning of appearance face models , 2001, ICCV 2001.

[96]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[97]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[98]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[99]  Jieping Ye,et al.  Characterization of a Family of Algorithms for Generalized Discriminant Analysis on Undersampled Problems , 2005, J. Mach. Learn. Res..

[100]  C. R. Rao,et al.  The Utilization of Multiple Measurements in Problems of Biological Classification , 1948 .

[101]  Geoffrey J. Gordon Generalized2 Linear2 Models , 2002, NIPS.

[102]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[103]  Rajesh P. N. Rao,et al.  An Active Vision Architecture Based on Iconic Representations , 1995, Artif. Intell..

[104]  Sanja Fidler,et al.  Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[105]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[106]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[107]  Steve Cherry,et al.  Singular Value Decomposition Analysis and Canonical Correlation Analysis , 1996 .

[108]  Dong Xu,et al.  Multilinear Discriminant Analysis for Face Recognition , 2007, IEEE Transactions on Image Processing.

[109]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[110]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[111]  David A. Landgrebe,et al.  Covariance Matrix Estimation and Classification With Limited Training Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[112]  S. Zamir,et al.  Lower Rank Approximation of Matrices by Least Squares With Any Choice of Weights , 1979 .

[113]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[114]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[115]  R. Fletcher Practical Methods of Optimization , 1988 .

[116]  João M. F. Xavier,et al.  Spectrally optimal factorization of incomplete matrices , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[117]  Fernando De la Torre,et al.  Canonical Time Warping for Alignment of Human Behavior , 2009, NIPS.

[118]  Michael J. Black,et al.  Dynamic coupled component analysis , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[119]  G. W. Israel,et al.  A receptor model using a specific non-negative transformation technique for ambient aerosol , 1989 .

[120]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[121]  Shai Avidan,et al.  Fast Pixel/Part Selection with Sparse Eigenvectors , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[122]  H. Saunders Book Reviews : NUMERICAL METHODS IN FINITE ELEMENT ANALYSIS K.-J. Bathe and E.L. Wilson Prentice-Hall, Inc, Englewood Cliffs, NJ , 1978 .

[123]  Takeo Kanade,et al.  Filtered Component Analysis to Increase Robustness to Local Minima in Appearance Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[124]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[125]  Brendan J. Frey,et al.  Transformation-Invariant Clustering and Dimensionality Reduction Using EM , 2001 .

[126]  Lei Wang,et al.  Generalized 2D principal component analysis for face image representation and recognition , 2005, Neural Networks.

[127]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[128]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[129]  Louis L. Scharf,et al.  The SVD and reduced rank signal processing , 1991, Signal Process..

[130]  Edward L. Wilson,et al.  Numerical methods in finite element analysis , 1976 .

[131]  Shuicheng Yan,et al.  Graph embedding: a general framework for dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[132]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[133]  M. Tso Reduced‐Rank Regression and Canonical Analysis , 1981 .

[134]  Amnon Shashua,et al.  Doubly Stochastic Normalization for Spectral Clustering , 2006, NIPS.

[135]  Victor J. Yohai,et al.  Canonical Variables as Optimal Predictors , 1980 .

[136]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[137]  Sebastian Mika,et al.  Kernel Fisher Discriminants , 2003 .

[138]  Tamir Hazan,et al.  Multi-way Clustering Using Super-Symmetric Non-negative Tensor Factorization , 2006, ECCV.

[139]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[140]  J. Leeuw,et al.  Principal component analysis of three-mode data by means of alternating least squares algorithms , 1980 .

[141]  Jieping Ye,et al.  Least squares linear discriminant analysis , 2007, ICML '07.

[142]  C. Ding,et al.  Two-Dimensional Singular Value Decomposition ( 2 DSVD ) for 2 D Maps and Images , 2005 .

[143]  Hans Knutsson,et al.  Learning multidimensional signal processing , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[144]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[145]  R. Hartley,et al.  PowerFactorization : 3D reconstruction with missing or uncertain data , 2003 .

[146]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[147]  Jessica K. Hodgins,et al.  Aligned Cluster Analysis for temporal segmentation of human motion , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[148]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[149]  B. Moghaddam,et al.  Sparse regression as a sparse eigenvalue problem , 2008, 2008 Information Theory and Applications Workshop.

[150]  B. Mohar Some applications of Laplace eigenvalues of graphs , 1997 .

[151]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[152]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[153]  Hakan Cevikalp,et al.  Discriminative common vectors for face recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[154]  Gary L. Miller,et al.  Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[155]  E. A. Sylvestre,et al.  Self Modeling Curve Resolution , 1971 .

[156]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[157]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[158]  Trevor F. Cox,et al.  Multidimensional Scaling, Second Edition , 2000 .

[159]  Andrew Blake,et al.  Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.

[160]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[161]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[162]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[163]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[164]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[165]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[166]  Horst Bischof,et al.  Robust Recognition Using Eigenimages , 2000, Comput. Vis. Image Underst..

[167]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[168]  Pierre Comon,et al.  Independent component analysis, a survey of some algebraic methods , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[169]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[170]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.