Learning the Differential Correlation Matrix of a Smooth Function From Point Samples

Consider an open set $\mathbb{D}\subseteq\mathbb{R}^n$, equipped with a probability measure $\mu$. An important characteristic of a smooth function $f:\mathbb{D}\rightarrow\mathbb{R}$ is its $differential$ $correlation$ $matrix$ $\Sigma_{\mu}:=\int \nabla f(x) (\nabla f(x))^* \mu(dx) \in\mathbb{R}^{n\times n}$, where $\nabla f(x)\in\mathbb{R}^n$ is the gradient of $f(\cdot)$ at $x\in\mathbb{D}$. For instance, the span of the leading $r$ eigenvectors of $\Sigma_{\mu}$ forms an $active$ $subspace$ of $f(\cdot)$, thereby extending the concept of $principal$ $component$ $analysis$ to the problem of $ridge$ $approximation$. In this work, we propose a simple algorithm for estimating $\Sigma_{\mu}$ from point values of $f(\cdot)$ $without$ imposing any structural assumptions on $f(\cdot)$. Theoretical guarantees for this algorithm are provided with the aid of the same technical tools that have proved valuable in the context of covariance matrix estimation from partial measurements.

[1]  E. Novak,et al.  Tractability of Multivariate Problems , 2008 .

[2]  Henryk Wozniakowski,et al.  A general theory of optimal algorithms , 1980, ACM monograph series.

[3]  A. Samarov Exploring Regression Structure Using Nonparametric Functional Estimation , 1993 .

[4]  G. Lecu'e,et al.  Optimal rates and adaptation in the single-index model using aggregation , 2007, math/0703706.

[5]  Christopher K. I. Williams,et al.  Discovering Hidden Features with Gaussian Processes Regression , 1998, NIPS.

[6]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[7]  Anatoli B. Juditsky,et al.  NONPARAMETRIC ESTIMATION OF COMPOSITE FUNCTIONS , 2009, 0906.0865.

[8]  Rachel A. Ward,et al.  A near-stationary subspace for ridge approximation , 2016, 1606.01929.

[9]  Paul G. Constantine,et al.  Active Subspaces - Emerging Ideas for Dimension Reduction in Parameter Studies , 2015, SIAM spotlights.

[10]  J. Polzehl,et al.  Structure adaptive approach for dimension reduction , 2001 .

[11]  Karim Lounici High-dimensional covariance matrix estimation with missing observations , 2012, 1201.2577.

[12]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[13]  Ilias Bilionis,et al.  Gaussian processes with built-in dimensionality reduction: Applications in high-dimensional uncertainty propagation , 2016, 1602.04550.

[14]  Anru Zhang,et al.  ROP: Matrix Recovery via Rank-One Projections , 2013, ArXiv.

[15]  R. DeVore,et al.  Approximation of Functions of Few Variables in High Dimensions , 2011 .

[16]  Bing Li,et al.  Sufficient dimension reduction based on an ensemble of minimum average variance estimators , 2011, 1203.3313.

[17]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[18]  Volkan Cevher,et al.  Active learning of self-concordant like multi-index functions , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[20]  H. Tong,et al.  Article: 2 , 2002, European Financial Services Law.

[21]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[22]  I. Daubechies,et al.  Capturing Ridge Functions in High Dimensions from Point Queries , 2012 .

[23]  I. Johnstone,et al.  Projection-Based Approximation and a Duality with Kernel Methods , 1989 .

[24]  Andrea J. Goldsmith,et al.  Exact and Stable Covariance Estimation From Quadratic Sampling via Convex Programming , 2013, IEEE Transactions on Information Theory.

[25]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[26]  Robert D. Nowak,et al.  Distilled Sensing: Adaptive Sampling for Sparse Detection and Estimation , 2010, IEEE Transactions on Information Theory.

[27]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[28]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[29]  V. Cevher,et al.  Learning Non-Parametric Basis Independent Models from Point Queries via Low-Rank Methods , 2013, 1310.1826.

[30]  Sandra Keiper,et al.  Analysis of generalized ridge functions in high dimensions , 2015, 2015 International Conference on Sampling Theory and Applications (SampTA).

[31]  R. Dennis Cook,et al.  Using Dimension-Reduction Subspaces to Identify Important Inputs in Models of Physical Systems ∗ , 2009 .

[32]  Parikshit Shah,et al.  Sketching Sparse Matrices, Covariances, and Graphs via Tensor Products , 2015, IEEE Transactions on Information Theory.

[33]  Jan Vybíral,et al.  Learning Functions of Few Arbitrary Linear Parameters in High Dimensions , 2010, Found. Comput. Math..

[34]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[35]  Eric P. Xing,et al.  Consistent Covariance Selection From Data With Missing Values , 2012, ICML.

[36]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .