Maximal Correlation Regression

In this paper, we propose a novel regression analysis approach, called maximal correlation regression, by exploiting the ideas from the Hirschfeld-Gebelein-Rényi (HGR) maximal correlation. We show that in supervised learning problems, the optimal weights in maximal correlation regression can be expressed analytically with the relationships to the HGR maximal correlation functions, which reveals theoretical insights for our approach. In addition, we apply the maximal correlation regression to deep learning, in which efficient training algorithms are proposed for learning the weights in hidden layers. Furthermore, we illustrate that the maximal correlation regression is deeply connected to several existing approaches in information theory and machine learning, including the universal feature selection problem, linear discriminant analysis, and the softmax regression. Finally, experiments on real datasets demonstrate that our approach can obtain performance comparable to the widely used softmax regression based-method.

[1]  W. Marsden I and J , 2012 .

[2]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[3]  Xiao-Yuan Jing,et al.  Intraspectrum Discrimination and Interspectrum Correlation Analysis Deep Network for Multispectral Face Recognition , 2020, IEEE Transactions on Cybernetics.

[4]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[5]  Xiangxiang Xu,et al.  An Information Theoretic Interpretation to Deep Neural Networks , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[6]  Shao-Lun Huang,et al.  On Universal Features for High-Dimensional Learning and Inference , 2019, ArXiv.

[7]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[8]  H. Hirschfeld A Connection between Correlation and Contingency , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.

[9]  H. Gebelein Das statistische Problem der Korrelation als Variations‐ und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung , 1941 .

[10]  Reza Modarres,et al.  Measures of Dependence , 2011, International Encyclopedia of Statistical Science.

[11]  Shao-Lun Huang,et al.  An efficient algorithm for information decomposition and extraction , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  Shiguang Shan,et al.  Semi-Supervised Multi-View Correlation Feature Learning with Application to Webpage Classification , 2017, AAAI.

[13]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[14]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[17]  A. Buja Remarks on Functional Canonical Variates, Alternating Least Squares Methods and Ace , 1990 .