On Power-Law Kernels, Corresponding Reproducing Kernel Hilbert Space and Applications

The role of kernels is central to machine learning. Motivated by the importance of power-law distributions in statistical modeling, in this paper, we propose the notion of power-law kernels to investigate power-laws in learning problem. We propose two power-law kernels by generalizing Gaussian and Laplacian kernels. This generalization is based on distributions, arising out of maximization of a generalized information measure known as nonextensive entropy that is very well studied in statistical mechanics. We prove that the proposed kernels are positive definite, and provide some insights regarding the corresponding Reproducing Kernel Hilbert Space (RKHS). We also study practical significance of both kernels in classification and regression, and present some simulation results.

[1]  Vilfredo Pareto,et al.  Manuale di economia politica : con una introduzione alla scienza sociale , 1906 .

[2]  Shalabh Bhatnagar,et al.  q-Gaussian based Smoothed Functional algorithms for stochastic optimization , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[3]  A. Plastino,et al.  Central limit theorem and deformed exponentials , 2007 .

[4]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[5]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[6]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[7]  M. N. Murty,et al.  On measure-theoretic aspects of nonextensive entropy functionals and corresponding maximum entropy prescriptions , 2007 .

[8]  Hiroki Suyari Generalization of Shannon-Khinchin axioms to nonextensive systems and the uniqueness theorem for the nonextensive entropy , 2004, IEEE Transactions on Information Theory.

[9]  Joseph Lipka,et al.  A Table of Integrals , 2010 .

[10]  Jan Havrda,et al.  Quantification method of classification processes. Concept of structural a-entropy , 1967, Kybernetika.

[11]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[12]  Yun He,et al.  A generalized divergence measure for robust image registration , 2003, IEEE Trans. Signal Process..

[13]  Giorgio Parisi,et al.  Physica A: Statistical Mechanics and its Applications: Editorial note , 2005 .

[14]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[15]  I. M. Pyshik,et al.  Table of integrals, series, and products , 1965 .

[16]  C. Tsallis,et al.  Nonextensive foundation of Lévy distributions. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[17]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[18]  Eric P. Xing,et al.  Nonextensive Information Theoretic Kernels on Measures , 2009, J. Mach. Learn. Res..

[19]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[20]  C. Tsallis,et al.  The role of constraints within generalized nonextensive statistics , 1998 .

[21]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[22]  Thomas L. Griffiths,et al.  Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models , 2011, J. Mach. Learn. Res..

[23]  S. Abe,et al.  Itineration of the Internet over nonequilibrium stationary states in Tsallis statistics. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  A. Sato q-Gaussian distributions and multiplicative stochastic processes for analysis of multiple financial time series , 2010 .

[25]  Don R. Hush,et al.  An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels , 2006, IEEE Transactions on Information Theory.

[26]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[27]  E. Nadaraya On Estimating Regression , 1964 .

[28]  B. Gutenberg,et al.  Seismicity of the Earth and associated phenomena , 1950, MAUSAM.

[29]  Funabashi,et al.  Scale-free statistics of time interval between successive earthquakes , 2004, cond-mat/0410123.

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  R. V. Churchill,et al.  Lectures on Fourier Integrals , 1959 .

[32]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[33]  A. Rényi,et al.  Selected papers of Alfréd Rényi , 1976 .

[34]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[35]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .