Deep Canonical Correlation Analysis

We introduce Deep Canonical Correlation Analysis (DCCA), a method to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated. Parameters of both transformations are jointly learned to maximize the (regularized) total correlation. It can be viewed as a nonlinear extension of the linear method canonical correlation analysis (CCA). It is an alternative to the nonparametric method kernel canonical correlation analysis (KCCA) for learning correlated nonlinear transformations. Unlike KCCA, DCCA does not require an inner product, and has the advantages of a parametric method: training time scales well with data size and the training data need not be referenced when computing the representations of unseen instances. In experiments on two real-world datasets, we find that DCCA learns representations with significantly higher correlation than those learned by CCA and KCCA. We also introduce a novel non-saturating sigmoid function based on the cube root that may be useful more generally in feedforward neural networks.

[1]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[2]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[3]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .

[4]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[5]  A. Murat Tekalp,et al.  Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.

[6]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[7]  Francis R. Bach,et al.  Consistency of trace norm minimization , 2007, J. Mach. Learn. Res..

[8]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[10]  Janaina Mourão Miranda,et al.  Unsupervised analysis of fMRI data using kernel canonical correlation , 2007, NeuroImage.

[11]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[12]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[13]  Yoshua Bengio,et al.  Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[14]  B. Moor,et al.  On the Regularization of Canonical Correlation Analysis , 2003 .

[15]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[16]  Neil D. Lawrence,et al.  Ambiguity Modeling in Latent Spaces , 2008, MLMI.

[17]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[18]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[19]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[20]  Christoph H. Lampert,et al.  Correlational spectral clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Vaibhava Goel,et al.  Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  M. Kanehisa,et al.  Graph-driven features extraction from microarray data , 2002, physics/0206055.

[23]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[24]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[25]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[26]  Raman Arora,et al.  Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  G. Chollet,et al.  Adaptation of automatic speech recognizers to new speakers using canonical correlation analysis techniques , 1986 .

[28]  Frank Rudzicz,et al.  Adaptive Kernel Canonical Correlation Analysis for Estimation of Task Dynamics from Acoustics , 2010, ICASSP.

[29]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[30]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[31]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[32]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[33]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[34]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[35]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[36]  Raman Arora,et al.  Kernel CCA for multi-view learning of acoustic features using articulatory measurements , 2012, MLSLP.

[37]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[38]  Malcolm Slaney,et al.  FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.

[39]  A. S. Lewis,et al.  Derivatives of Spectral Functions , 1996, Math. Oper. Res..

[40]  Horst Bischof,et al.  Nonlinear Feature Extraction Using Generalized Canonical Correlation Analysis , 2001, ICANN.

[41]  L. Montanarella,et al.  Chemometric classification of some european wines using pyrolysis mass spectrometry , 1995 .

[42]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.