Learning the Nonlinear Geometry of High-Dimensional Data: Models and Algorithms

Modern information processing relies on the axiom that high-dimensional data lie near low-dimensional geometric structures. This paper revisits the problem of data-driven learning of these geometric structures and puts forth two new nonlinear geometric models for data describing “related” objects/phenomena. The first one of these models straddles the two extremes of the subspace model and the union-of-subspaces model, and is termed the metric-constrained union-of-subspaces (MC-UoS) model. The second one of these models-suited for data drawn from a mixture of nonlinear manifolds-generalizes the kernel subspace model, and is termed the metric-constrained kernel union-of-subspaces (MC-KUoS) model. The main contributions of this paper in this regard include the following. First, it motivates and formalizes the problems of MC-UoS and MC-KUoS learning. Second, it presents algorithms that efficiently learn an MC-UoS or an MC-KUoS underlying data of interest. Third, it extends these algorithms to the case when parts of the data are missing. Last, but not least, it reports the outcomes of a series of numerical experiments involving both synthetic and real data that demonstrate the superiority of the proposed geometric models and learning algorithms over existing approaches in the literature. These experiments also help clarify the connections between this work and the literature on (subspace and kernel k-means) clustering.

[1]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[2]  David J. Kriegman,et al.  Clustering appearances of objects under varying illumination conditions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[3]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[4]  Yousef Saad,et al.  Trace optimization and eigenproblems in dimension reduction methods , 2011, Numer. Linear Algebra Appl..

[5]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[6]  Tong Wu,et al.  Revisiting robustness of the union-of-subspaces model for data-adaptive learning of nonlinear signal models , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[9]  Yonina C. Eldar,et al.  Dictionary Optimization for Block-Sparse Representations , 2010, IEEE Transactions on Signal Processing.

[10]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[11]  Ahmed H. Tewfik,et al.  Learning Sparse Representation Using Iterative Subspace Identification , 2010, IEEE Transactions on Signal Processing.

[12]  Michael W. Marcellin,et al.  An overview of JPEG-2000 , 2000, Proceedings DCC 2000. Data Compression Conference.

[13]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[14]  Yonina C. Eldar,et al.  Robust Recovery of Signals From a Structured Union of Subspaces , 2008, IEEE Transactions on Information Theory.

[15]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[16]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[17]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[18]  Robert D. Nowak,et al.  High-Rank Matrix Completion , 2012, AISTATS.

[19]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[20]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[21]  Liwei Wang,et al.  Further results on the subspace distance , 2007, Pattern Recognit..

[22]  H. Harman Modern factor analysis , 1961 .

[23]  Gilad Lerman,et al.  Median K-Flats for hybrid linear modeling with many outliers , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[24]  Emmanuel J. Candès,et al.  Robust Subspace Clustering , 2013, ArXiv.

[25]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[26]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[27]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Tong Wu,et al.  Metric-Constrained Kernel Union of Subspaces , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[30]  Volkan Cevher,et al.  Low-Dimensional Models for Dimensionality Reduction and Signal Recovery: A Geometric Perspective , 2010, Proceedings of the IEEE.

[31]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[32]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Aswin C. Sankaranarayanan,et al.  Greedy feature selection for subspace clustering , 2013, J. Mach. Learn. Res..

[34]  Minh N. Do,et al.  A Theory for Sampling Signals from a Union of Subspaces , 2022 .

[35]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[36]  Kun Huang,et al.  Multiscale Hybrid Linear Models for Lossy Image Representation , 2006, IEEE Transactions on Image Processing.

[37]  Ivor W. Tsang,et al.  The pre-image problem in kernel methods , 2003, IEEE Transactions on Neural Networks.

[38]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[39]  Robert D. Nowak,et al.  High-dimensional Matched Subspace Detection when data are missing , 2010, 2010 IEEE International Symposium on Information Theory.

[40]  Robert D. Nowak,et al.  K-subspaces with missing data , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[41]  Tong Wu,et al.  Subspace detection in a kernel space: The missing data case , 2014, 2014 IEEE Workshop on Statistical Signal Processing (SSP).

[42]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Patrick J. Wolfe,et al.  Minimax Rank Estimation for Subspace Tracking , 2009, IEEE Journal of Selected Topics in Signal Processing.

[44]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[45]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[46]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[47]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[48]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[49]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[50]  Michael Elad,et al.  Low Bit-Rate Compression of Facial Images , 2007, IEEE Transactions on Image Processing.

[51]  Robert D. Nowak,et al.  Online identification and tracking of subspaces from highly incomplete information , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[52]  Helmut Bölcskei,et al.  Robust Subspace Clustering via Thresholding , 2013, IEEE Transactions on Information Theory.

[53]  M. Kloft,et al.  Norm Multiple Kernel Learning , 2011 .

[54]  Lior Wolf,et al.  Kernel principal angles for classification machines with applications to image sequence interpretation , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[55]  P. Bartlett,et al.  ` p-Norm Multiple Kernel Learning , 2008 .

[56]  Gilad Lerman,et al.  Hybrid Linear Modeling via Local Best-Fit Flats , 2010, International Journal of Computer Vision.

[57]  José M. Bioucas-Dias,et al.  Hyperspectral Subspace Identification , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[58]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.

[59]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[60]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[61]  Emmanuel J. Candès,et al.  The curvelet transform for image denoising , 2002, IEEE Trans. Image Process..

[62]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[63]  Liwei Wang,et al.  Subspace distance analysis with application to adaptive Bayesian algorithm for face recognition , 2006, Pattern Recognit..

[64]  Ronen Basri,et al.  Lambertian reflectance and linear subspaces , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[65]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[67]  René Vidal,et al.  Motion Segmentation in the Presence of Outlying, Incomplete, or Corrupted Trajectories , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[69]  Allen Tannenbaum,et al.  Statistical shape analysis using kernel PCA , 2006, Electronic Imaging.

[70]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  J. Suykens,et al.  Convex Clustering Shrinkage , 2005 .

[72]  B. Nadler,et al.  Determining the number of components in a factor model from limited noisy data , 2008 .

[73]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[74]  James C. Bezdek,et al.  Convergence of Alternating Optimization , 2003, Neural Parallel Sci. Comput..

[75]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[76]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.