Multi-View Dimensionality Reduction via Canonical Correlation Multi-View Dimensionality Reduction via Canonical Correlation Analysis Analysis Multi-View Dimensionality Reduction via Canonical Correlation Analysis Multi-View Dimensionality Reduction via Canonical Correlation Analysis Multi-View Dimen

We analyze the multi-view regression problem where we have two views X = (X, X) of the input data and a target variableY of interest. We provide sufficient conditions under which we can reduce the dimensionality of X (via a projection) without loosing predictive power of Y . Crucially, this projection can be computed via a Canonical Correlation Analysis only on the unlabeled data. The algorithmic template is as f ollows: with unlabeled data, perform CCA and construct a certain projection; with the labeled data, do least squa res regression in this lower dimensional space. We show how, under certain natural assumptions, the number o f labeled samples could be significantly reduced (in comparison to the single view setting) — in particular, we show how this dimen sionality reduction does not loose predictive power ofY (thus it only introduces little bias but could drastically reduce the variance). We explore two separate assumptions under which this is possible and show how, under either assumption alone, dimensionality reduction could reduce the labeled sample complexity. The two assumptions we consider are a conditional independenceassumption and a redundancyassumption. The typical conditional independence assumption is that conditioned onY the viewsX andX are independent — we relax this assumption to: conditioned on some hidden state H the viewsX andX are independent. Under the redundancy assumption, we show that the best predictor from each view is roughly as good as the best predictor u sing both views.

[1]  A. Tsybakov,et al.  Sliced Inverse Regression for Dimension Reduction - Comment , 1991 .

[2]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[3]  Tong Zhang,et al.  Two-view feature generation model for semi-supervised learning , 2007, ICML '07.

[4]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[5]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[6]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[7]  Samuel Kaski,et al.  Simple integrative preprocessing preserves what is shared in data sources , 2008, BMC Bioinformatics.

[8]  David A. Forsyth,et al.  ManifoldBoost: stagewise function approximation for fully-, semi- and un-supervised learning , 2008, ICML '08.

[9]  Nicholas Kushmerick,et al.  Learning to remove Internet advertisements , 1999, AGENTS '99.

[10]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[11]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[12]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[13]  Maria-Florina Balcan,et al.  A PAC-Style Model for Learning from Labeled and Unlabeled Data , 2005, COLT.

[14]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[15]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.