Learning Gradients: Predictive Models that Infer Geometry and Statistical Dependence

The problems of dimension reduction and inference of statistical dependence are addressed by the modeling framework of learning gradients. The models we propose hold for Euclidean spaces as well as the manifold setting. The central quantity in this approach is an estimate of the gradient of the regression or classification function. Two quadratic forms are constructed from gradient estimates: the gradient outer product and gradient based diffusion maps. The first quantity can be used for supervised dimension reduction on manifolds as well as inference of a graphical model encoding dependencies that are predictive of a response variable. The second quantity can be used for nonlinear projections that incorporate both the geometric structure of the manifold as well as variation of the response variable on the manifold. We relate the gradient outer product to standard statistical quantities such as covariances and provide a simple and precise comparison of a variety of supervised dimensionality reduction methods. We provide rates of convergence for both inference of informative directions as well as inference of a graphical model of variable dependencies.

[1]  R. J. Adcock A Problem in Least Squares , 1878 .

[2]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[3]  I. Holopainen Riemannian Geometry , 1927, Nature.

[4]  T. Speed,et al.  Gaussian Markov Distributions over Finite Graphs , 1986 .

[5]  G. Wahba Spline models for observational data , 1990 .

[6]  B. Vogelstein,et al.  A genetic model for colorectal tumorigenesis , 1990, Cell.

[7]  S. Weisberg,et al.  Comments on "Sliced inverse regression for dimension reduction" by K. C. Li , 1991 .

[8]  Ker-Chau Li,et al.  Slicing Regression: A Link-Free Regression Method , 1991 .

[9]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[10]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[11]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[12]  D. Födinger,et al.  High expression of a CD38-like molecule in normal prostatic epithelium and its differential loss in benign and malignant disease. , 1995, The Journal of urology.

[13]  J. Kovalevsky Reduction of Observations , 1995 .

[14]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[15]  Michael I. Jordan Graphical Models , 2003 .

[16]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[17]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[18]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[19]  H. Tong,et al.  An adaptive estimation of dimension reduction space , 2002 .

[20]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[22]  D. Donoho,et al.  Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .

[23]  R. Coifman,et al.  Diffusion Wavelets , 2004 .

[24]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[25]  Ximing J. Yang,et al.  Discovery and clinical application of a novel prostate cancer marker: alpha-methylacyl CoA racemase (P504S). , 2004, American journal of clinical pathology.

[26]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[27]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Gilles Blanchard,et al.  On the Convergence of Eigenspaces in Kernel Principal Component Analysis , 2005, NIPS.

[30]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[31]  D. Ornstein,et al.  The impact of altered annexin I protein levels on apoptosis and signal transduction pathways in prostate cancer cells , 2006, The Prostate.

[32]  V. Koltchinskii,et al.  Empirical graph Laplacian approximation of Laplace–Beltrami operators: Large sample results , 2006, math/0612777.

[33]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[34]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[35]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[36]  Sayan Mukherjee,et al.  Learning Coordinate Covariances via Gradients , 2006, J. Mach. Learn. Res..

[37]  Sayan Mukherjee,et al.  Estimation of Gradients and Coordinate Covariation in Classification , 2006, J. Mach. Learn. Res..

[38]  Arthur D. Szlam,et al.  Diffusion wavelet packets , 2006 .

[39]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[40]  John T. Wei,et al.  Integrative molecular concept modeling of prostate cancer progression , 2007, Nature Genetics.

[41]  R. Christensen,et al.  Fisher Lecture: Dimension Reduction in Regression , 2007, 0708.3774.

[42]  Ronald R. Coifman,et al.  Regularization on Graphs with Function-adapted Diffusion Processes , 2008, J. Mach. Learn. Res..

[43]  Sayan Mukherjee,et al.  Modeling Cancer Progression via Pathway Dependencies , 2008, PLoS Comput. Biol..

[44]  Mikhail Belkin,et al.  Towards a theoretical foundation for Laplacian-based manifold methods , 2005, J. Comput. Syst. Sci..

[45]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[46]  Ding-Xuan Zhou,et al.  Learning gradients on manifolds , 2010, 1002.4283.

[47]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[48]  WU Qiang,et al.  Regularized sliced inverse regression for kernel models , 2022 .

[49]  Ding-Xuan Zhou,et al.  Learning Gradients and Feature Selection on Manifolds , .