On speeding up computation in information theoretic learning

With the recent progress in kernel based learning methods, computation with Gram matrices has received immense attention. However, the complexity of computing the entire Gram matrix is quadratic in terms of number of samples. Therefore, a considerable amount of work has been focused on extracting relevant information from the Gram matrix without accessing all the elements. Most of these methods exploits the positive definiteness and rapidly decaying eigenstructure of the Gram matrix. Although information theoretic learning (ITL) is conceptually different from kernel based learning, several ITL estimators can be written in terms of Gram matrices. However, the difference between ITL and kernel based methods is that a few ITL estimators include a special type of matrix which is neither positive definite nor symmetric. In this paper we discuss how the techniques applied in kernel based learning can be applied to reduce computational complexity of the ITL estimators involving both Gram matrices and these other matrices.

[1]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[2]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[5]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[6]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[7]  Andrzej Cichocki,et al.  A new nonlinear similarity measure for multichannel signals , 2008, Neural Networks.

[8]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  José Carlos Príncipe,et al.  A Reproducing Kernel Hilbert Space Framework for Information-Theoretic Learning , 2008, IEEE Transactions on Signal Processing.

[10]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[11]  S. Bochner Hilbert Distances and Positive Definite Functions , 1941 .