Stochastic algorithms with descent guarantees for ICA

Independent component analysis (ICA) is a widespread data exploration technique, where observed signals are modeled as linear mixtures of independent components. From a machine learning point of view, it amounts to a matrix factorization problem with a statistical independence criterion. Infomax is one of the most used ICA algorithms. It is based on a loss function which is a non-convex log-likelihood. We develop a new majorization-minimization framework adapted to this loss function. We derive an online algorithm for the streaming setting, and an incremental algorithm for the finite sum setting, with the following benefits. First, unlike most algorithms found in the literature, the proposed methods do not rely on any critical hyper-parameter like a step size, nor do they require a line-search technique. Second, the algorithm for the finite sum setting, although stochastic, guarantees a decrease of the loss function at each iteration. Experiments demonstrate progress on the state-of-the-art for large scale datasets, without the necessity for any manual parameter tuning.

[1]  Mark A. Girolami,et al.  A Variational Method for Learning Sparse and Overcomplete Representations , 2001, Neural Computation.

[2]  Dimitris Samaras,et al.  Variable Selection for Gaussian Graphical Models , 2012, AISTATS.

[3]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[4]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[5]  Jean-François Cardoso,et al.  Equivariant adaptive source separation , 1996, IEEE Trans. Signal Process..

[6]  Aapo Hyvärinen,et al.  The Fixed-Point Algorithm and Maximum Likelihood Estimation for Independent Component Analysis , 1999, Neural Processing Letters.

[7]  O. Cappé,et al.  On‐line expectation–maximization algorithm for latent data models , 2009 .

[8]  J. Cardoso Infomax and maximum likelihood for blind source separation , 1997, IEEE Signal Processing Letters.

[9]  R. Oostenveld,et al.  Independent EEG Sources Are Dipolar , 2012, PloS one.

[10]  S Makeig,et al.  Blind separation of auditory event-related brain responses into independent components. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[12]  Francis Bach,et al.  Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[13]  Yongchao Yang,et al.  Blind identification of damage in time-varying systems using independent component analysis with wavelet transform , 2014 .

[14]  Alexandre Gramfort,et al.  Faster ICA Under Orthogonal Constraint , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Philippe Garat,et al.  Blind separation of mixture of independent sources through a quasi-maximum likelihood approach , 1997, IEEE Trans. Signal Process..

[16]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[17]  Alexandre Gramfort,et al.  Caveats with Stochastic Gradient and Maximum Likelihood Based ICA for EEG , 2017, LVA/ICA.

[18]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[19]  M. Zibulevsky BLIND SOURCE SEPARATION WITH RELATIVE NEWTON METHOD , 2003 .

[20]  Alexandre Gramfort,et al.  Faster Independent Component Analysis by Preconditioning With Hessian Approximations , 2017, IEEE Transactions on Signal Processing.

[21]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[22]  Eric Moreau,et al.  Self-adaptive source separation. II. Comparison of the direct, feedback, and mixed linear network , 1998, IEEE Trans. Signal Process..

[23]  Bhaskar D. Rao,et al.  Variational EM Algorithms for Non-Gaussian Latent Variable Models , 2005, NIPS.

[24]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[25]  P. Gestraud,et al.  Independent component analysis uncovers the landscape of the bladder tumor transcriptome and reveals insights into luminal and basal subtypes. , 2014, Cell reports.

[26]  I. P. Waldmann,et al.  REVISITING SPITZER TRANSIT OBSERVATIONS WITH INDEPENDENT COMPONENT ANALYSIS: NEW RESULTS FOR THE GJ 436 SYSTEM , 2015, 1501.05866.

[27]  P O Hoyer,et al.  Independent component analysis applied to feature extraction from colour and stereo images , 2000, Network.

[28]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[29]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[30]  Jean-Francois Cardoso,et al.  Approximate likelihood for noisy mixtures , 1999 .

[31]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[32]  Saâd Jbabdi,et al.  Concurrent white matter bundles and grey matter networks using independent component analysis , 2017, NeuroImage.

[33]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[34]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[35]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[36]  Andrzej Cichocki,et al.  Stability Analysis of Learning Algorithms for Blind Source Separation , 1997, Neural Networks.