Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis

Learning salient representations of multiview data is an essential step in many applications such as image classification, retrieval, and annotation. Standard predictive methods, such as support vector machines, often directly use all the features available without taking into consideration the presence of distinct views and the resultant view dependencies, coherence, and complementarity that offer key insights to the semantics of the data, and are therefore offering weak performance and are incapable of supporting view-level analysis. This paper presents a statistical method to learn a predictive subspace representation underlying multiple views, leveraging both multiview dependencies and availability of supervising side-information. Our approach is based on a multiview latent subspace Markov network (MN) which fulfills a weak conditional independence assumption that multiview observations and response variables are conditionally independent given a set of latent variables. To learn the latent subspace MN, we develop a large-margin approach which jointly maximizes data likelihood and minimizes a prediction loss on training data. Learning and inference are efficiently done with a contrastive divergence method. Finally, we extensively evaluate the large-margin latent MN on real image and hotel review datasets for classification, regression, image annotation, and retrieval. Our results demonstrate that the large-margin approach can achieve significant improvements in terms of prediction performance and discovering predictive latent subspace representations.

[1]  Christopher Joseph Pal,et al.  Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification , 2006, AAAI.

[2]  Luc Van Gool,et al.  Integrating multiple model views for object recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Ning Chen,et al.  Predictive Subspace Learning for Multi-view Data: a Large Margin Approach , 2010, NIPS.

[4]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[9]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[10]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT '99.

[11]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[12]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Yiming Yang,et al.  Flexible latent variable models for multi-task learning , 2008, Machine Learning.

[14]  Selim Aksoy,et al.  Scene Classification Using Bag-of-Regions Representations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[16]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[17]  Trevor Darrell,et al.  Multi-View Learning in the Presence of View Disagreement , 2008, UAI 2008.

[18]  G. Michailidis,et al.  On multi-view learning with additive models , 2009, 0906.1117.

[19]  Fei-Fei Li,et al.  Large Margin Learning of Upstream Scene Understanding Models , 2010, NIPS.

[20]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.

[21]  Michael I. Jordan Graphical Models , 1998 .

[22]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[23]  Dean P. Foster Multi-View Dimensionality Reduction via Canonical Correlation Multi-View Dimensionality Reduction via Canonical Correlation Analysis Analysis Multi-View Dimensionality Reduction via Canonical Correlation Analysis Multi-View Dimensionality Reduction via Canonical Correlation Analysis Multi-View Dimen , 2008 .

[24]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[25]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[26]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[27]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[28]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[29]  Eric P. Xing,et al.  Conditional Topic Random Fields , 2010, ICML.

[30]  Tom Diethe,et al.  Multiview Fisher Discriminant Analysis , 2008 .

[31]  Tong Zhang,et al.  Two-view feature generation model for semi-supervised learning , 2007, ICML '07.

[32]  Ben Taskar,et al.  Multi-View Learning over Structured and Non-Identical Outputs , 2008, UAI.

[33]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[34]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[35]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Eric P. Xing,et al.  Harmonium Models for Semantic Video Representation and Classification , 2007, SDM.

[37]  Ulf Brefeld,et al.  Co-EM support vector learning , 2004, ICML.

[38]  Geoffrey E. Hinton,et al.  A New Learning Algorithm for Mean Field Boltzmann Machines , 2002, ICANN.

[39]  Bo Zhang,et al.  Partially Observed Maximum Entropy Discrimination Markov Networks , 2008, NIPS.

[40]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[41]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[42]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[43]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[44]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[45]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[46]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[47]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[48]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[49]  Michael I. Jordan,et al.  A generalized mean field algorithm for variational inference in exponential families , 2002, UAI.

[50]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[52]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.