Deep Multi-View Learning via Task-Optimal CCA

Canonical Correlation Analysis (CCA) is widely used for multimodal data analysis and, more recently, for discriminative tasks such as multi-view learning; however, it makes no use of class labels. Recent CCA methods have started to address this weakness but are limited in that they do not simultaneously optimize the CCA projection for discrimination and the CCA projection itself, or they are linear only. We address these deficiencies by simultaneously optimizing a CCA-based and a task objective in an end-to-end manner. Together, these two objectives learn a non-linear CCA projection to a shared latent space that is highly correlated and discriminative. Our method shows a significant improvement over previous state-of-the-art (including deep supervised approaches) for cross-view classification, regularization with a second view, and semi-supervised learning on real data.

[1]  George Lee,et al.  Supervised multi-view canonical correlation analysis: fused multimodal prediction of disease diagnosis and prognosis , 2014, Medical Imaging.

[2]  Jack L. Gallant,et al.  Pyrcca: Regularized Kernel Canonical Correlation Analysis in Python and Its Applications to Neuroimaging , 2015, Front. Neuroinform..

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[6]  Rama Chellappa,et al.  Joint Sparse Representation for Robust Multimodal Biometrics Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Xing Xu,et al.  Coupled dictionary learning and feature mapping for cross-modal retrieval , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[9]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[10]  A. Murat Tekalp,et al.  Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.

[11]  Gerhard Widmer,et al.  Deep Linear Discriminant Analysis , 2015, ICLR.

[12]  Asok Ray,et al.  Multimodal Task-Driven Dictionary Learning for Image Classification , 2015, IEEE Transactions on Image Processing.

[13]  Ishwar K. Sethi,et al.  Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.

[14]  Jim Jing-Yan Wang,et al.  Joint learning of cross-modal classifier and factor analysis for multimedia data classification , 2015, Neural Computing and Applications.

[15]  Lawrence Carin,et al.  Bayesian joint analysis of heterogeneous genomics data , 2014, Bioinform..

[16]  Hugo Larochelle,et al.  Correlational Neural Networks , 2015, Neural Computation.

[17]  K. Strimmer,et al.  Optimal Whitening and Decorrelation , 2015, 1512.00809.

[18]  Lei Huang,et al.  Decorrelated Batch Normalization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Haibo Wang,et al.  Supervised Multi-View Canonical Correlation Analysis (sMVCCA): Integrating Histologic and Proteomic Features for Predicting Recurrent Prostate Cancer , 2015, IEEE Transactions on Medical Imaging.

[20]  Zhiyuan Hu,et al.  Racial Differences in PAM50 Subtypes in the Carolina Breast Cancer Study , 2018, Journal of the National Cancer Institute.

[21]  H. T. Kung,et al.  Multimodal sparse representation learning and applications , 2015, Journal of AI Humanities.

[22]  Balasubramanian Raman,et al.  Common Representation Learning Using Step-Based Correlation Multi-modal CNN , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[23]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[24]  Jeff A. Bilmes,et al.  Unsupervised learning of acoustic features via deep canonical correlation analysis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Nathan Srebro,et al.  Stochastic optimization for deep CCA via nonlinear orthogonal iterations , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[26]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[27]  J. S. Marron,et al.  Angle-based joint and individual variation explained , 2017, J. Multivar. Anal..

[28]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[29]  Raman Arora,et al.  Kernel CCA for multi-view learning of acoustic features using articulatory measurements , 2012, MLSLP.

[30]  Tao Xiang,et al.  Scalable and Effective Deep CCA via Soft Decorrelation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Carlo Luschi,et al.  Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.

[32]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[33]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[34]  Tijl De Bie,et al.  Eigenproblems in Pattern Recognition , 2005 .

[35]  Shiguang Shan,et al.  Multi-View Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Gerhard Widmer,et al.  End-to-end cross-modality retrieval with CCA projections and pairwise ranking loss , 2017, International Journal of Multimedia Information Retrieval.