Least-Squares Independent Component Analysis

Accurately evaluating statistical independence among random variables is a key element of independent component analysis (ICA). In this letter, we employ a squared-loss variant of mutual information as an independence measure and give its estimation method. Our basic idea is to estimate the ratio of probability densities directly without going through density estimation, thereby avoiding the difficult task of density estimation. In this density ratio approach, a natural cross-validation procedure is available for hyperparameter selection. Thus, all tuning parameters such as the kernel width or the regularization parameter can be objectively optimized. This is an advantage over recently developed kernel-based independence measures and is a highly useful property in unsupervised learning problems such as ICA. Based on this novel independence measure, we develop an ICA algorithm, named least-squares independent component analysis.

[1]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[2]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[3]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[4]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[5]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[6]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[7]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[8]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[9]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[10]  Takafumi Kanamori,et al.  Computational complexity of kernel-based density-ratio estimation: a condition number analysis , 2012, Machine Learning.

[11]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[12]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[13]  Masashi Sugiyama,et al.  Machine Learning with Squared-Loss Mutual Information , 2012, Entropy.

[14]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[15]  Andrzej Cichocki,et al.  Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications , 2002 .

[16]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[17]  Shotaro Akaho,et al.  Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold , 2005, Neurocomputing.

[18]  Masashi Sugiyama,et al.  On Kernel Parameter Selection in Hilbert-Schmidt Independence Criterion , 2012, IEICE Trans. Inf. Syst..

[19]  Sugiyama Masashi,et al.  Relative Density-Ratio Estimation for Robust Distribution Comparison , 2011 .

[20]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[21]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[22]  Aapo Hyv Fast and Robust Fixed-Point Algorithms for Independent Component Analysis , 1999 .

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[25]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[26]  Masashi Sugiyama,et al.  Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation , 2012 .

[27]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[28]  J. Cardoso,et al.  Blind beamforming for non-gaussian signals , 1993 .

[29]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[30]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[31]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[32]  Takafumi Kanamori,et al.  Relative Density-Ratio Estimation for Robust Distribution Comparison , 2011, Neural Computation.

[33]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[34]  Marc M. Van Hulle,et al.  Sequential Fixed-Point ICA Based on Mutual Information Minimization , 2008, Neural Computation.

[35]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[36]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[37]  Masashi Sugiyama,et al.  Feature Selection via L1-Penalized Squared-Loss Mutual Information , 2012, IEICE Trans. Inf. Syst..

[38]  Takafumi Kanamori,et al.  Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation , 2008, FSDM.

[39]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .