Nonlinear Multiview Analysis: Identifiability and Neural Network-Assisted Implementation

Multiview analysis aims at extracting shared latent components from data samples that are acquired in different domains, e.g., image, text, and audio. Classic multiview analysis, e.g., canonical correlation analysis (CCA), tackles this problem via matching the linearly transformed views in a certain latent domain. More recently, powerful nonlinear learning tools such as kernel methods and neural networks are utilized for enhancing the classic CCA. However, unlike linear CCA whose theoretical aspects are clearly understood, nonlinear CCA approaches are largely intuition-driven. In particular, it is unclear under what conditions the shared latent components across the views can be identified—while identifiability plays an essential role in many applications. In this work, we revisit nonlinear multiview analysis and address both the theoretical and computational aspects. Our work leverages a useful nonlinear model, namely, the post-nonlinear model, from the nonlinear mixture separation literature. Combining with multiview data, we take a nonlinear multiview mixture learning viewpoint, which is a natural extension of the classic generative models for linear CCA. From there, we derive a learning criterion. We show that minimizing this criterion leads to identification of the latent shared components up to certain ambiguities, under reasonable conditions. Our derivation and formulation also offer new insights and interpretations to existing deep neural network-based CCA formulations. On the computation side, we propose an effective algorithm with simple and scalable update rules. A series of simulations and real-data experiments corroborate our theoretical analysis.

[1]  Sham M. Kakade,et al.  Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis , 2016, ICML.

[2]  J. Kettenring,et al.  Canonical Analysis of Several Sets of Variables , 2022 .

[3]  Bernhard Schölkopf,et al.  Randomized Nonlinear Component Analysis , 2014, ICML.

[4]  Ahmed S. Zamzam,et al.  Data-Driven Learning-Based Optimization for Distribution System State Estimation , 2018, IEEE Transactions on Power Systems.

[5]  Nikos D. Sidiropoulos,et al.  Efficient and Distributed Algorithms for Large-Scale Generalized Canonical Correlations Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[6]  Xiao Fu,et al.  On Identifiability of Nonnegative Matrix Factorization , 2017, IEEE Signal Processing Letters.

[7]  G. Golub,et al.  The canonical correlations of matrix pairs and their numerical computation , 1992 .

[8]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[9]  Huda Khayrallah,et al.  Deep Generalized Canonical Correlation Analysis , 2017, RepL4NLP@ACL.

[10]  Vince D. Calhoun,et al.  Joint Blind Source Separation by Multiset Canonical Correlation Analysis , 2009, IEEE Transactions on Signal Processing.

[11]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[12]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[13]  Alfred O. Hero,et al.  Nonlinear Unmixing of Hyperspectral Images: Models and Algorithms , 2013, IEEE Signal Processing Magazine.

[14]  Andreas Ziehe,et al.  Artifact Reduction in Magnetoneurography Based on Time-Delayed Second Order Correlations , 1998 .

[15]  Juha Karhunen,et al.  Advances in blind source separation (BSS) and independent component analysis (ICA) for nonlinear mixtures , 2004, Int. J. Neural Syst..

[16]  Xiao Fu,et al.  Nonlinear Multiview Analysis: Identifiability and Neural Network-based Implementation , 2020, 2020 IEEE 11th Sensor Array and Multichannel Signal Processing Workshop (SAM).

[17]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[18]  Wing-Kin Ma,et al.  Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications , 2018, IEEE Signal Processing Magazine.

[19]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[20]  Aapo Hyvärinen,et al.  Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA , 2016, NIPS.

[21]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  Dean P. Foster,et al.  Large Scale Canonical Correlation Analysis with Iterative Least Squares , 2014, NIPS.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[26]  L. Ronkin Liouville's theorems for functions holomorphic on the zero set of a polynomial , 1979 .

[27]  Robert P. W. Duin,et al.  Handwritten digit recognition by combined classifiers , 1998, Kybernetika.

[28]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[29]  Nikos D. Sidiropoulos,et al.  Batch and Adaptive PARAFAC-Based Blind Separation of Convolutive Speech Mixtures , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[31]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[32]  Dean P. Foster,et al.  Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis , 2015, ICML.

[33]  Nathan Srebro,et al.  Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis , 2016, NIPS.

[34]  Tülay Adali,et al.  Joint blind source separation from second-order statistics: Necessary and sufficient identifiability conditions , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[35]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[36]  Nikos D. Sidiropoulos,et al.  Blind Separation of Quasi-Stationary Sources: Exploiting Convex Geometry in Covariance Domain , 2015, IEEE Transactions on Signal Processing.

[37]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[38]  Motoaki Kawanabe,et al.  Blind Separation of Post-nonlinear Mixtures using Linearizing Transformations and Temporal Decorrelation , 2003, J. Mach. Learn. Res..

[39]  Andrea Bergmann,et al.  Statistical Parametric Mapping The Analysis Of Functional Brain Images , 2016 .

[40]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[41]  Mingyi Hong,et al.  Structured SUMCOR Multiview Canonical Correlation Analysis for Large-Scale Data , 2018, IEEE Transactions on Signal Processing.

[42]  Christian Jutten,et al.  Source separation in post-nonlinear mixtures , 1999, IEEE Trans. Signal Process..

[43]  Bo Yang,et al.  Learning Nonlinear Mixtures: Identifiability and Algorithm , 2019, IEEE Transactions on Signal Processing.

[44]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[45]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[46]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[47]  Hyun Ah Song,et al.  Efficient and Distributed Generalized Canonical Correlation Analysis for Big Multiview Data , 2019, IEEE Transactions on Knowledge and Data Engineering.

[48]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[49]  Nicholas D. Sidiropoulos,et al.  Cell-edge Interferometry: Reliable Detection of Unknown Cell-edge Users via Canonical Correlation Analysis , 2019, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[50]  Christian Jutten,et al.  Identifiability of post-nonlinear mixtures , 2005, IEEE Signal Processing Letters.

[51]  Jeff A. Bilmes,et al.  Unsupervised learning of acoustic features via deep canonical correlation analysis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[53]  Nikos D. Sidiropoulos,et al.  BrainZoom: High Resolution Reconstruction from Multi-modal Brain Signals , 2017, SDM.

[54]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[55]  Aapo Hyvärinen,et al.  Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[56]  Erkki Oja,et al.  The nonlinear PCA learning rule in independent component analysis , 1997, Neurocomputing.

[57]  Nikos D. Sidiropoulos,et al.  Scalable and Flexible Multiview MAX-VAR Canonical Correlation Analysis , 2016, IEEE Transactions on Signal Processing.

[58]  I. Olkin The density of the inverse and pseudo-inverse of a random matrix , 1998 .

[59]  Ioannis Patras,et al.  Nonlinear Independent Component Analysis for EEG-Based Brain-Computer Interface Systems , 2012 .

[60]  Lieven De Lathauwer,et al.  Blind Identification of Underdetermined Mixtures by Simultaneous Matrix Diagonalization , 2008, IEEE Transactions on Signal Processing.

[61]  Lieven De Lathauwer,et al.  Fourth-Order Cumulant-Based Blind Identification of Underdetermined Mixtures , 2007, IEEE Transactions on Signal Processing.

[62]  Sergio Cruces,et al.  Bounded Component Analysis of Linear Mixtures: A Criterion of Minimum Convex Perimeter , 2010, IEEE Transactions on Signal Processing.

[63]  Vince D. Calhoun,et al.  Canonical Correlation Analysis for Feature-Based Fusion of Biomedical Imaging Modalities and Its Application to Detection of Associative Networks in Schizophrenia , 2008, IEEE Journal of Selected Topics in Signal Processing.

[64]  Indrayana Rustandi,et al.  Integrating Multiple-Study Multiple-Subject fMRI Datasets Using Canonical Correlation Analysis , 2009 .

[65]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[66]  Bo Yang,et al.  Robust Volume Minimization-Based Matrix Factorization for Remote Sensing and Document Clustering , 2016, IEEE Transactions on Signal Processing.