On Kernel Parameter Selection in Hilbert-Schmidt Independence Criterion

The Hilbert-Schmidt independence criterion (HSIC) is a kernel-based statistical independence measure that can be computed very efficiently. However, it requires us to determine the kernel parameters heuristically because no objective model selection method is available. Least-squares mutual information (LSMI) is another statistical independence measure that is based on direct density-ratio estimation. Although LSMI is computationally more expensive than HSIC, LSMI is equipped with cross-validation, and thus the kernel parameter can be determined objectively. In this paper, we show that HSIC can actually be regarded as an approximation to LSMI, which allows us to utilize cross-validation of LSMI for determining kernel parameters in HSIC. Consequently, both computational efficiency and cross-validation can be achieved.

[1]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[2]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[3]  Masashi Sugiyama,et al.  Least-Squares Independent Component Analysis , 2011, Neural Computation.

[4]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[5]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[6]  Masashi Sugiyama,et al.  Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation , 2010, Neural Computation.

[7]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[8]  Masashi Sugiyama,et al.  Condition Number Analysis of Kernel-based Density Ratio Estimation , 2009, 0912.2800.

[9]  Takafumi Kanamori,et al.  Mutual information estimation reveals global associations between stimuli and biological processes , 2009, BMC Bioinformatics.

[10]  Masashi Sugiyama,et al.  Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise , 2010, AAAI.

[11]  Hao Shen,et al.  Fast Kernel-Based Independent Component Analysis , 2009, IEEE Transactions on Signal Processing.

[12]  Masashi Sugiyama,et al.  Suffcient Component Analysis , 2011, ACML.

[13]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[14]  Bernhard Schölkopf,et al.  Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[15]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[16]  Masashi Sugiyama,et al.  Least-Squares Independence Test , 2011, IEICE Trans. Inf. Syst..

[17]  Masashi Sugiyama,et al.  Dependence-Maximization Clustering with Least-Squares Mutual Information , 2011, J. Adv. Comput. Intell. Intell. Informatics.

[18]  Masashi Sugiyama,et al.  Cross-Domain Object Matching with Model Selection , 2011, AISTATS.

[19]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[20]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[21]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[22]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[23]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[24]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[25]  Le Song,et al.  Kernelized Sorting , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Masashi Sugiyama,et al.  On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution , 2011, ICML.

[27]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[28]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .