A Survey on Multi-view Learning

In recent years, a great many methods of learning from multi-view data by considering the diversity of different views have been proposed. These views may be obtained from multiple sources or different feature subsets. In trying to organize and highlight similarities and differences between the variety of multi-view learning approaches, we review a number of representative multi-view learning algorithms in different areas and classify them into three groups: 1) co-training, 2) multiple kernel learning, and 3) subspace learning. Notably, co-training style algorithms train alternately to maximize the mutual agreement on two distinct views of the data; multiple kernel learning algorithms exploit kernels that naturally correspond to different views and combine kernels either linearly or non-linearly to improve learning performance; and subspace learning algorithms aim to obtain a latent subspace shared by multiple views by assuming that the input views are generated from this latent subspace. Though there is significant variance in the approaches to integrating multiple views to improve learning performance, they mainly exploit either the consensus principle or the complementary principle to ensure the success of multi-view learning. Since accessing multiple views is the fundament of multi-view learning, with the exception of study on learning a model from multiple views, it is also valuable to study how to construct multiple views and how to evaluate these views. Overall, by exploring the consistency and complementary properties of different views, multi-view learning is rendered more effective, more promising, and has better generalization ability than single-view learning.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[3]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[4]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[8]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[9]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[10]  David P. Helmbold,et al.  Leveraging for Regression , 2000, COLT.

[11]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[12]  Nello Cristianini,et al.  Composite Kernels for Hypertext Categorisation , 2001, ICML.

[13]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[14]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[15]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[16]  Claire Cardie,et al.  Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[17]  Kristin P. Bennett,et al.  MARK: a boosting algorithm for heterogeneous kernel models , 2002, KDD.

[18]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[19]  Andrew G. Clark,et al.  Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) , 2002 .

[20]  Craig A. Knoblock,et al.  Adaptive View Validation: A First Step Towards Automatic View Detection , 2002, ICML.

[21]  Michael Strube,et al.  Applying Co-Training to Reference Resolution , 2002, ACL.

[22]  Ellen Riloff,et al.  Exploiting Strong Syntactic Heuristics and Co-Training to Learn Semantic Lexicons , 2002, EMNLP.

[23]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[24]  Steven P. Abney,et al.  Bootstrapping , 2002, ACL.

[25]  Tat-Seng Chua,et al.  A bootstrapping approach to annotating large image collection , 2003, MIR '03.

[26]  Tobias Scheffer,et al.  Using Transduction and Multi-view Learning to Answer Emails , 2003, PKDD.

[27]  Craig A. Knoblock,et al.  Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction , 2003, IJCAI.

[28]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[29]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[30]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[31]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[32]  Maria-Florina Balcan,et al.  Co-Training and Expansion: Towards Bridging Theory and Practice , 2004, NIPS.

[33]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[34]  Jinbo Bi,et al.  Column-generation boosting methods for mixture of kernels , 2004, KDD.

[35]  Zhi-Hua Zhou,et al.  Exploiting Unlabeled Data in Content-Based Image Retrieval , 2004, ECML.

[36]  Tobias Scheffer Email answering assistance by semi-supervised text classification , 2004, Intell. Data Anal..

[37]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[38]  Tat-Seng Chua,et al.  A bootstrapping framework for annotating and retrieving WWW images , 2004, MULTIMEDIA '04.

[39]  Ulf Brefeld,et al.  Co-EM support vector learning , 2004, ICML.

[40]  Rebecca Hwa,et al.  Co-training for Predicting Emotions with Spoken Dialogue Data , 2004, ACL.

[41]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[42]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Rajesh P. N. Rao,et al.  Learning Shared Latent Structure for Image Synthesis and Robotic Imitation , 2005, NIPS.

[44]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[45]  Zhi-Hua Zhou,et al.  Semi-Supervised Regression with Co-Training , 2005, IJCAI.

[46]  Mikhail Belkin,et al.  A Co-Regularization Approach to Semi-supervised Learning with Multiple Views , 2005 .

[47]  Gustavo E. A. P. A. Batista,et al.  Multi-view Semi-supervised Learning: An Approach to Obtain Different Views from Text Datasets , 2005, LAPTEC.

[48]  Ulf Brefeld,et al.  Multi-view Discriminative Sequential Learning , 2005, ECML.

[49]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[50]  Gunnar Rätsch,et al.  A General and Efficient Multiple Kernel Learning Algorithm , 2005, NIPS.

[51]  W. Zheng,et al.  Facial expression recognition using kernel canonical correlation analysis (KCCA) , 2006, IEEE Transactions on Neural Networks.

[52]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Horst Bischof,et al.  Fast Active Appearance Model Search Using Canonical Correlation Analysis , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Shai Ben-David,et al.  Learning Bounds for Support Vector Machines with Learned Kernels , 2006, COLT.

[55]  Xuelong Li,et al.  Multitraining Support Vector Machine for Image Retrieval , 2006, IEEE Transactions on Image Processing.

[56]  Thomas Gärtner,et al.  Efficient co-regularised least squares regression , 2006, ICML.

[57]  Roland Memisevic,et al.  Kernel information embeddings , 2006, ICML.

[58]  Hans-Peter Kriegel,et al.  Multi-Output Regularized Feature Projection , 2006, IEEE Transactions on Knowledge and Data Engineering.

[59]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[60]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[61]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[62]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[63]  Chiou-Shann Fuh,et al.  Local Ensemble Kernel Learning for Object Category Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[65]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[66]  Kongqiao Wang,et al.  Active learning for image retrieval with Co-SVM , 2007, Pattern Recognit..

[67]  Zhi-Hua Zhou,et al.  Analyzing Co-training Style Algorithms , 2007, ECML.

[68]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[69]  R. Bharat Rao,et al.  Bayesian Co-Training , 2007, J. Mach. Learn. Res..

[70]  Ankita Kumar,et al.  Support Kernel Machines for Object Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[71]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[72]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[73]  Mark J. F. Gales,et al.  Multiple kernel learning for speaker verification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[74]  Vikas Sindhwani,et al.  An RKHS for multi-view learning and manifold co-regularization , 2008, ICML '08.

[75]  Trevor Darrell,et al.  Multi-View Learning in the Presence of View Disagreement , 2008, UAI 2008.

[76]  Yun Fu,et al.  Multiple feature fusion by subspace learning , 2008, CIVR '08.

[77]  Jieping Ye,et al.  A least squares formulation for canonical correlation analysis , 2008, ICML '08.

[78]  John Shawe-Taylor,et al.  Convergence analysis of kernel Canonical Correlation Analysis: theory and practice , 2008, Machine Learning.

[79]  Yves Grandvalet,et al.  Composite kernel learning , 2008, ICML '08.

[80]  Songcan Chen,et al.  MultiK-MHKS: A Novel Multiple Kernel Learning Algorithm , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  O. Chapelle Second order optimization of kernel parameters , 2008 .

[82]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[83]  Tom Diethe,et al.  Multiview Fisher Discriminant Analysis , 2008 .

[84]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[85]  Xuelong Li,et al.  Patch Alignment for Dimensionality Reduction , 2009, IEEE Transactions on Knowledge and Data Engineering.

[86]  Ion Muslea,et al.  Active Learning with Multiple Views , 2009, Encyclopedia of Data Warehousing and Mining.

[87]  Trevor Darrell,et al.  Bayesian Localized Multiple Kernel Learning , 2009 .

[88]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[89]  Trevor Darrell,et al.  Co-training with noisy perceptual observations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Massih-Reza Amini,et al.  A co-classification approach to learning from multilingual corpora , 2010, Machine Learning.

[91]  Larry S. Davis,et al.  Incremental Multiple Kernel Learning for object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[92]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[93]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[94]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[95]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[96]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[97]  Zenglin Xu,et al.  Non-monotonic feature selection , 2009, ICML '09.

[98]  C. Campbell,et al.  Generalization bounds for learning the kernel , 2009 .

[99]  David J. Fleet,et al.  Shared Kernel Information Embedding for discriminative inference , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[100]  David J. Fleet,et al.  Shared Kernel Information Embedding for Discriminative Inference , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Yung C. Shin,et al.  Sparse Multiple Kernel Learning for Signal Processing Applications , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[102]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[103]  Trevor Darrell,et al.  Factorized Latent Spaces with Structured Sparsity , 2010, NIPS.

[104]  Christopher Joseph Pal,et al.  Cross Lingual Adaptation: An Experiment on Sentiment Classifications , 2010, ACL.

[105]  Ning Chen,et al.  Predictive Subspace Learning for Multi-view Data: a Large Margin Approach , 2010, NIPS.

[106]  Zenglin Xu,et al.  Simple and Efficient Multiple Kernel Learning by Group Lasso , 2010, ICML.

[107]  Trevor Darrell,et al.  Factorized Orthogonal Latent Spaces , 2010, AISTATS.

[108]  Zhi-Hua Zhou,et al.  A New Analysis of Co-Training , 2010, ICML.

[109]  Piyush Rai,et al.  Co-regularized Spectral Clustering with Multiple Kernels , 2010 .

[110]  Piyush Rai,et al.  Multiview Clustering with Incomplete Views , 2010 .

[111]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[112]  Pong C. Yuen,et al.  A Boosted Co-Training Algorithm for Human Action Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[113]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[114]  Dacheng Tao,et al.  m-SNE: Multiview Stochastic Neighbor Embedding , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[115]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[116]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[117]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[118]  Shiliang Sun,et al.  View Construction for Multi-view Semi-supervised Learning , 2011, ISNN.

[119]  Yixin Chen,et al.  Automatic Feature Decomposition for Single View Co-training , 2011, ICML.

[120]  Christoph H. Lampert,et al.  Learning Multi-View Neighborhood Preserving Projections , 2011, ICML.

[121]  Hongwei Sun,et al.  Convergence rate of kernel canonical correlation analysis , 2011 .

[122]  Zheng-Jun Zha,et al.  Difficulty guided image retrieval using linear multiview embedding , 2011, ACM Multimedia.

[123]  Kaizhu Huang,et al.  m-SNE: Multiview Stochastic Neighbor Embedding , 2011, IEEE Trans. Syst. Man Cybern. Part B.

[124]  Songcan Chen,et al.  A novel multi-view learning developed from single-view patterns , 2011, Pattern Recognit..

[125]  Rong Pan,et al.  Bi-Weighting Domain Adaptation for Cross-Language Text Classification , 2011, IJCAI.

[126]  Gilles Blanchard,et al.  The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning , 2011, NIPS.

[127]  Zi Huang,et al.  Dimensionality reduction by Mixed Kernel Canonical Correlation Analysis , 2012, Pattern Recognition.

[128]  Wen Gao,et al.  Multiview Metric Learning with Global Consistency and Local Smoothness , 2012, TIST.

[129]  Melba M. Crawford,et al.  View Generation for Multiview Maximum Disagreement Based Active Learning for Hyperspectral Image Classification , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[130]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[131]  Liangpei Zhang,et al.  On Combining Multiple Features for Hyperspectral Remote Sensing Image Classification , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[132]  Jun Yu,et al.  On Combining Multiple Features for Cartoon Character Retrieval and Clip Synthesis , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[133]  Yueting Zhuang,et al.  Sparse Unsupervised Dimensionality Reduction for Multiple View Data , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[134]  Meng Wang,et al.  Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis , 2012, IEEE Transactions on Image Processing.