Regularized Discriminant Analysis, Ridge Regression and Beyond

Fisher linear discriminant analysis (FDA) and its kernel extension--kernel discriminant analysis (KDA)--are well known methods that consider dimensionality reduction and classification jointly. While widely deployed in practical problems, there are still unresolved issues surrounding their efficient implementation and their relationship with least mean squares procedures. In this paper we address these issues within the framework of regularized estimation. Our approach leads to a flexible and efficient implementation of FDA as well as KDA. We also uncover a general relationship between regularized discriminant analysis and ridge regression. This relationship yields variations on conventional FDA based on the pseudoinverse and a direct equivalence to an ordinary least squares estimator.

[1]  Johan A. K. Suykens,et al.  The differogram: Non-parametric noise variance estimation and its use for model selection , 2005, Neurocomputing.

[2]  Haesun Park,et al.  A Relationship between Linear Discriminant Analysis and the Generalized Minimum Squared Error Solution , 2005, SIAM J. Matrix Anal. Appl..

[3]  Hui Xiong,et al.  IDR/QR: An Incremental Dimension Reduction Algorithm via QR Decomposition , 2005, IEEE Trans. Knowl. Data Eng..

[4]  Haesun Park,et al.  Nonlinear Discriminant Analysis Using Kernel Functions and the Generalized Singular Value Decomposition , 2005, SIAM J. Matrix Anal. Appl..

[5]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[6]  Haesun Park,et al.  Structure Preserving Dimension Reduction for Clustered Text Data Based on the Generalized Singular Value Decomposition , 2003, SIAM J. Matrix Anal. Appl..

[7]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[9]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[10]  H. Luetkepohl The Handbook of Matrices , 1996 .

[11]  Jing-Yu Yang,et al.  Optimal fisher discriminant analysis using the rank decomposition , 1992, Pattern Recognit..

[12]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[13]  C. Loan Generalizing the Singular Value Decomposition , 1976 .

[14]  Johan A. K. Suykens,et al.  Kernel Canonical Correlation Analysis and Least Squares Support Vector Machines , 2001, ICANN.

[15]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[16]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[18]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[19]  Josef Kittler,et al.  A new approach to feature selection based on the Karhunen-Loeve expansion , 1973, Pattern Recognit..

[20]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[21]  Gunnar Rätsch,et al.  Invariant Feature Extraction and Classification in Kernel Spaces , 1999, NIPS.

[22]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[23]  Zhihua Zhang,et al.  Optimal Scoring for Unsupervised Learning , 2009, NIPS.

[24]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[25]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[26]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[27]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[28]  Haesun Park,et al.  Generalizing discriminant analysis using the generalized singular value decomposition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  M. Saunders,et al.  Towards a Generalized Singular Value Decomposition , 1981 .

[30]  Johan A. K. Suykens,et al.  Bayesian Framework for Least-Squares Support Vector Machine Classifiers, Gaussian Processes, and Kernel Fisher Discriminant Analysis , 2002, Neural Computation.

[31]  Multiway Spectral Clustering: A Margin-based Perspective , 2008, 1102.3768.

[32]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[33]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[34]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[35]  J. Friedman Regularized Discriminant Analysis , 1989 .

[36]  Hui Xiong,et al.  IDR/QR: an incremental dimension reduction algorithm via QR decomposition , 2004, IEEE Transactions on Knowledge and Data Engineering.

[37]  Jieping Ye,et al.  Least squares linear discriminant analysis , 2007, ICML '07.

[38]  David G. Stork,et al.  Pattern Classification , 1973 .