Multi-view Regression Via Canonical Correlation Analysis

In the multi-view regression problem, we have a regression problem where the input variable (which is a real vector) can be partitioned into two different views, where it is assumed that either view of the input is sufficient to make accurate predictions -- this is essentially (a significantly weaker version of) the co-training assumption for the regression problem. We provide a semi-supervised algorithm which first uses unlabeled data to learn a norm (or, equivalently, a kernel) and then uses labeled data in a ridge regression algorithm (with this induced norm) to provide the predictor. The unlabeled data is used via canonical correlation analysis (CCA, which is a closely related to PCA for two random variables) to derive an appropriate norm over functions. We are able to characterize the intrinsic dimensionality of the subsequent ridge regression problem (which uses this norm) by the correlation coefficients provided by CCA in a rather simple expression. Interestingly, the norm used by the ridge regression algorithm is derived from CCA, unlike in standard kernel methods where a special apriori norm is assumed (i.e. a Banach space is assumed). We discuss how this result shows that unlabeled data can decrease the sample complexity.

[1]  H. Hotelling The most predictable criterion. , 1935 .

[2]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[3]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[4]  Suzanna Becker,et al.  Mutual information maximization: models of cortical self-organization. , 1996, Network.

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[7]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[8]  Liang-sheng Lu,et al.  [Expression of fusion proteins in beta(2)GP I gene-transfected HEp-2 cells and its clinical application]. , 2002, Zhonghua yi xue za zhi.

[9]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[10]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[11]  Steven P. Abney Understanding the Yarowsky Algorithm , 2004, CL.

[12]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[13]  Mikhail Belkin,et al.  A Co-Regularization Approach to Semi-supervised Learning with Multiple Views , 2005 .

[14]  Maria-Florina Balcan,et al.  A PAC-Style Model for Learning from Labeled and Unlabeled Data , 2005, COLT.

[15]  Tong Zhang,et al.  Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.

[16]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[17]  Thomas Gärtner,et al.  Efficient co-regularised least squares regression , 2006, ICML.

[18]  Peter L. Bartlett,et al.  The Rademacher Complexity of Co-Regularized Kernel Classes , 2007, AISTATS.