N ONLINEAR ICA USING V OLUME -P RESERVING T RANSFORMATIONS

Nonlinear ICA is a fundamental problem in machine learning, aiming to identify the underlying independent components (sources) from data which is assumed to be a nonlinear function (mixing function) of these sources. Recent works prove that if the sources have some particular structures (e.g. temporal structure), they are theoretically identifiable even if the mixing function is arbitrary. However, in many cases such restrictions on the sources are difficult to satisfy or even verify, hence it inhibits the applicability of the proposed methods. Different from these works, we propose a general framework for nonlinear ICA, in which the mixing function is assumed to be a volume-preserving transformation, and meanwhile the conditions on the sources can be much looser. We provide an insightful proof of the identifiability of the proposed framework. We implement the framework by volume-preserving flow-based models, and verify our theory by experiments on artificial data and synthesized images. Moreover, results on real-world images indicate that our framework can disentangle interpretable features. and u (2) ; the existence of overlap of samples from p ( s | u (1) ) and p ( s | u (1) ) . For verifying experiment of each synthesize an artificial sources can still be successfully identified. This indicates that there exists stronger identifiability theorem with less conditions, which is an appealing for further exploration. We also empirically show that a volume-preserving flow-based model using two classes significantly excels the state-of-the-art ICA method (iVAE), and is able to learn interpretable attributes from real world datasets. These results show the advantages and applicability of our proposed framework. bright-ness, hair (bang) and grin, respectively. These results further demonstrate that our framework is probably able to identify the true sources from real-world datasets, and hence support our theory.

[1]  Micah Goldblum,et al.  The Intrinsic Dimension of Images and Its Impact on Learning , 2021, ICLR.

[2]  N. Mitra,et al.  StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows , 2020, ACM Trans. Graph..

[3]  Ben Poole,et al.  Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.

[4]  Ullrich Köthe,et al.  Disentanglement by Nonlinear ICA with General Incompressible-flow Networks (GIN) , 2020, ICLR.

[5]  Aapo Hyvärinen,et al.  Variational Autoencoders and Nonlinear ICA: A Unifying Framework , 2019, AISTATS.

[6]  Alexander Lerchner,et al.  Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs , 2019, ArXiv.

[7]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[9]  Aapo Hyvärinen,et al.  Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning , 2018, AISTATS.

[10]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[11]  Aapo Hyvärinen,et al.  Nonlinear ICA of Temporally Dependent Stationary Sources , 2017, AISTATS.

[12]  Zhansheng Duan,et al.  Principal Component Analysis Networks and Algorithms , 2017 .

[13]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[14]  Aapo Hyvärinen,et al.  Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA , 2016, NIPS.

[15]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[19]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[20]  Laurenz Wiskott,et al.  An extension of slow feature analysis for nonlinear blind source separation , 2014, J. Mach. Learn. Res..

[21]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[23]  Hariharan Narayanan,et al.  Sample Complexity of Testing the Manifold Hypothesis , 2010, NIPS.

[24]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[25]  Lawrence Cayton,et al.  Algorithms for manifold learning , 2005 .

[26]  Motoaki Kawanabe,et al.  Kernel-Based Nonlinear Blind Source Separation , 2003, Neural Computation.

[27]  Christian Jutten,et al.  Source separation in post-nonlinear mixtures , 1999, IEEE Trans. Signal Process..

[28]  Aapo Hyvärinen,et al.  Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[31]  Abbas El Gamal,et al.  An information - theoretic proof of Hadamard's inequality , 1983, IEEE Trans. Inf. Theory.