Supervised Dimensionality Reduction via Distance Correlation Maximization

In our work, we propose a novel formulation for supervised dimensionality reduction based on a nonlinear dependency criterion called Statistical Distance Correlation, Szekely et. al. (2007). We propose an objective which is free of distributional assumptions on regression variables and regression model assumptions. Our proposed formulation is based on learning a low-dimensional feature representation $\mathbf{z}$, which maximizes the squared sum of Distance Correlations between low dimensional features $\mathbf{z}$ and response $y$, and also between features $\mathbf{z}$ and covariates $\mathbf{x}$. We propose a novel algorithm to optimize our proposed objective using the Generalized Minimization Maximizaiton method of \Parizi et. al. (2015). We show superior empirical results on multiple datasets proving the effectiveness of our proposed approach over several relevant state-of-the-art supervised dimensionality reduction methods.

[1]  Victor J. Yohai,et al.  The sliced inverse regression algorithm as a maximum likelihood procedure , 2009 .

[2]  Adolfo Martínez Usó,et al.  UJIIndoorLoc: A new multi-building and multi-floor database for WLAN fingerprint-based indoor localization problems , 2014, 2014 International Conference on Indoor Positioning and Indoor Navigation (IPIN).

[3]  I. Stancu-Minasian Nonlinear Fractional Programming , 1997 .

[4]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[5]  Maria L. Rizzo,et al.  On the uniqueness of distance covariance , 2012 .

[6]  Gang Niu,et al.  Sufficient Component Analysis for Supervised Dimension Reduction , 2011, 1103.4998.

[7]  G. Wahba,et al.  Using distance covariance for improved variable selection with application to learning genetic risk models , 2015, Statistics in medicine.

[8]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[9]  R. Dennis Cook,et al.  Partial central subspace and sliced average variance estimation , 2009 .

[10]  Xiangrong Yin,et al.  Sufficient Dimension Reduction via Distance Covariance , 2016 .

[11]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[12]  S. Schaible Minimization of ratios , 1976 .

[13]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[14]  R. Dennis Cook,et al.  Diagnostic studies in sufficient dimension reduction , 2015 .

[15]  J. Kiefer,et al.  Sequential minimax search for a maximum , 1953 .

[16]  Hans-Peter Kriegel,et al.  2D Image Registration in CT Images Using Radial Image Descriptors , 2011, MICCAI.

[17]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[18]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[19]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[20]  Heng-Hui Lue,et al.  Sliced inverse regression for multivariate response regression , 2009 .

[21]  Noam Slonim,et al.  The Information Bottleneck : Theory and Applications , 2006 .

[22]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..

[23]  Antonio Cuevas,et al.  Variable selection in functional data classification: a maxima-hunting proposal , 2013, 1309.6697.

[24]  K. Lange The MM Algorithm , 2013 .

[25]  R. Cook,et al.  Likelihood-Based Sufficient Dimension Reduction , 2009 .

[26]  Krisztian Buza,et al.  Feedback Prediction for Blogs , 2012, GfKl.

[27]  Shotaro Akaho,et al.  Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold , 2005, Neurocomputing.

[28]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[29]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[30]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[31]  Fang Zhou,et al.  Predicting the Geographical Origin of Music , 2014, 2014 IEEE International Conference on Data Mining.

[32]  Masao Fukushima,et al.  Quadratic Fractional Programming Problems with Quadratic Constraints , 2008 .

[33]  Rauf Izmailov,et al.  Constructive setting for problems of density ratio estimation , 2015, Stat. Anal. Data Min..

[34]  R. Cook Graphics for regressions with a binary response , 1996 .

[35]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[36]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[37]  R. Dennis Cook,et al.  Marginal tests with sliced average variance estimation , 2007 .

[38]  R. Tapia,et al.  On Convergence of Minimization Methods: Attraction, Repulsion, and Selection , 2000 .

[39]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[40]  K. Fukumizu,et al.  Gradient-Based Kernel Dimension Reduction for Regression , 2014 .

[41]  Masashi Sugiyama,et al.  Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation , 2010, Neural Computation.