Eigen-analysis of kernel operators for nonlinear dimension reduction and discrimination

There has been growing interest in kernel methods for classification, clustering and dimension reduction. For example, kernel linear discriminant analysis, spectral clustering and kernel principal component analysis are widely used in statistical learning and data mining applications. The empirical success of the kernel method is generally attributed to nonlinear feature mapping induced by the kernel, which in turn determines a low dimensional data embedding. It is important to understand the effect of a kernel and its associated kernel parameter(s) on the embedding in relation to data distributions. In this dissertation, we examine the geometry of the nonlinear embeddings for kernel PCA and kernel LDA through spectral analysis of the corresponding kernel operators. In particular, we carry out eigenanalysis of the polynomial kernel operator associated with data distributions and investigate the effect of the degree of polynomial on the data embedding. We also investigate the effect of centering kernels on the spectral property of both polynomial and Gaussian kernel operators. In addition, we extend the framework of the eigen-analysis of kernel PCA to kernel LDA by considering between-class and within-class variation operators for polynomial kernels. The results provide both insights into the geometry of nonlinear data embeddings given by kernel methods and practical guidelines for choosing an appropriate degree for dimension reduction and discrimination with polynomial kernels. ii This is dedicated to my parents, Yigui Liang and Yunmei Hou; my sister, Zhiyan Liang and my husband, Sungmin Kim.

[1]  R. Taylor,et al.  The Numerical Treatment of Integral Equations , 1978 .

[2]  Christopher K. I. Williams,et al.  Gaussian regression and optimal finite dimensional linear models , 1997 .

[3]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[4]  Noureddine El Karoui,et al.  The spectrum of kernel random matrices , 2010, 1001.0492.

[5]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[6]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[7]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[11]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[13]  Mikhail Belkin,et al.  DATA SPECTROSCOPY: EIGENSPACES OF CONVOLUTION OPERATORS AND CLUSTERING , 2008, 0807.3719.

[14]  Johan A. K. Suykens,et al.  Sparse conjugate directions pursuit with application to fixed-size kernel models , 2011, Machine Learning.

[15]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[16]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[17]  Guy L. Scott,et al.  Feature grouping by 'relocalisation' of eigenvectors of the proximity matrix , 1990, BMVC.

[18]  G. Wahba Spline models for observational data , 1990 .

[19]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[20]  Johan A. K. Suykens,et al.  Optimized fixed-size kernel models for large data sets , 2010, Comput. Stat. Data Anal..

[21]  V. Vapnik,et al.  A note one class of perceptrons , 1964 .

[22]  G. Micula,et al.  Numerical Treatment of the Integral Equations , 1999 .

[23]  Pietro Perona,et al.  A Factorization Approach to Grouping , 1998, ECCV.

[24]  Meirav Galun,et al.  Fundamental Limitations of Spectral Clustering , 2006, NIPS.

[25]  Jeongyoun Ahn A stable hyperparameter selection for the Gaussian RBF kernel for discrimination , 2010, Stat. Anal. Data Min..

[26]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[27]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[28]  Linda Kaufman,et al.  Solving the quadratic programming problem arising in support vector classification , 1999 .

[29]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..