AN EFFICIENT RANK-DEFICIENT COMPUTATION OF THE PRINCIPLE OF RELEVANT INFORMATION

One of the main difficulties in computing information theoretic learning (ITL) estimators is the computational complexity that grows quadratically with data. Considerable amount of work has been done on computation of low rank approximations of Gram matrices without accessing all their elements. In this paper we discuss how these techniques can be applied to reduce computa- tional complexity of Principle of Relevant Information (PRI). This particular objective function involves estimators of Renyi's sec- ond order entropy and cross-entropy and their gradients, therefore posing a technical challenge for implementation in a realistic sce- nario. Moreover, we introduce a simple modification to the Nystrom method motivated by the idea that our estimator must perform accu- rately only for certain vectors not for all possible cases. We show some results on how this rank deficient decompositions allow the application of the PRI on moderately large datasets. A major issue, which we address in this paper, is that the amount of computation associated to the PRI grows quadratically with the size of the available sample. This limits the scale of the applications if one were to apply the formulas directly. The problem of poly- nomial growth on complexity has also received attention within the machine learning community working on kernel methods. Conse- quently, approaches to compute approximations to positive semidefi- nite matricesbased on kernels have been proposed (6, 7). The goal of these methods is to accurately estimate large Gram matrices without computing their n 2 elements, directly. It has been observed that in practice the eigenvalues of the Gram matrix drop rapidly and there- fore replacing the original matrixby alow rank approximation seems reasonable(7, 8). In our work, we derive an algorithm for the princi- ple of relevant information based on rank deficient approximations of a Gram matrix. We also propose a simple modified version of the Nystrom method particularly suited for estimation in ITL. The paper starts with a brief introduction to Renyi's Entropy and the associated information quantities with their corresponding rank deficient approximations. Then, the objective function for the prin- ciple of relevant information (PRI) is presented. Following, we pro- pose an implementation of the optimization problem based on rank deficient approximations. The algorithm is tested on simulated data for various accuracy regimes (different ranks) followed by some re- sults on realistic scenarios. Finally, we provide some conclusions along with future work directions.

[1]  Sudhir Madhav Rao,et al.  Unsupervised learning: An information theoretic framework , 2008 .

[2]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[3]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[4]  Mark Girolami,et al.  Orthogonal Series Density Estimation and the Kernel Eigenvalue Problem , 2002, Neural Computation.

[5]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[6]  Robert Jenssen,et al.  Kernel Entropy Component Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  A. Rényi On Measures of Entropy and Information , 1961 .

[8]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[9]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[10]  José Carlos Príncipe,et al.  On speeding up computation in information theoretic learning , 2009, 2009 International Joint Conference on Neural Networks.

[11]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[12]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[13]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[14]  Erwin Lutwak,et al.  Crame/spl acute/r-Rao and moment-entropy inequalities for Renyi entropy and generalized Fisher information , 2005, IEEE Transactions on Information Theory.