Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

Nonlinear independent component analysis (ICA) provides an appealing framework for unsupervised feature learning, but the models proposed so far are not identifiable. Here, we first propose a new intuitive principle of unsupervised deep learning from time series which uses the nonstationary structure of the data. Our learning principle, time-contrastive learning (TCL), finds a representation which allows optimal discrimination of time segments (windows). Surprisingly, we show how TCL can be related to a nonlinear ICA model, when ICA is redefined to include temporal nonstationarities. In particular, we show that TCL combined with linear ICA estimates the nonlinear ICA model up to point-wise transformations of the sources, and this solution is unique --- thus providing the first identifiability result for nonlinear ICA which is rigorous, constructive, as well as very general.

[1]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[2]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[3]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[4]  Kiyotoshi Matsuoka,et al.  A neural net for blind separation of nonstationary signals , 1995, Neural Networks.

[5]  Aapo Hyvärinen,et al.  Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[6]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[7]  Dinh-Tuan Pham,et al.  Blind separation of instantaneous mixtures of nonstationary sources , 2001, IEEE Trans. Signal Process..

[8]  Jacek M. Zurada,et al.  Nonlinear Blind Source Separation Using a Radial Basis Function Network , 2001 .

[9]  Aapo Hyvärinen,et al.  Blind source separation by nonstationarity of variance: a cumulant-based approach , 2001, IEEE Trans. Neural Networks.

[10]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[11]  Luís B. Almeida,et al.  MISEP -- Linear and Nonlinear ICA Based on Mutual Information , 2003, J. Mach. Learn. Res..

[12]  Motoaki Kawanabe,et al.  Kernel-Based Nonlinear Blind Source Separation , 2003, Neural Computation.

[13]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[14]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[15]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[16]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[17]  Darren Price,et al.  Investigating the electrophysiological basis of resting state networks using magnetoencephalography , 2011, Proceedings of the National Academy of Sciences.

[18]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[19]  A. Hyvärinen,et al.  Characterization of neuromagnetic brain rhythms over time scales of minutes using spatial independent component analysis , 2012, Human brain mapping.

[20]  Martin A. Riedmiller,et al.  Learning Temporal Coherent Features through Life-Time Sparsity , 2012, ICONIP.

[21]  M. Corbetta,et al.  A Cortical Core for Dynamic Integration of Functional Networks in the Resting Human Brain , 2012, Neuron.

[22]  Harri Valpola,et al.  From neural PCA to deep unsupervised learning , 2014, ArXiv.

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[25]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[26]  M. Gutmann,et al.  Statistical Inference of Intractable Generative Models via Classification , 2014 .

[27]  Laurenz Wiskott,et al.  An extension of slow feature analysis for nonlinear blind source separation , 2014, J. Mach. Learn. Res..

[28]  Jonathan Tompson,et al.  Unsupervised Feature Learning from Temporal Data , 2015, ICLR.

[29]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[30]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[31]  Ritabrata Dutta,et al.  Likelihood-free inference via classification , 2014, Stat. Comput..