A Family of Chisini Mean Based Jensen-Shannon Divergence Kernels

Jensen-Shannon divergence is an effective method for measuring the distance between two probability distributions. When the difference between these two distributions is subtle, Jensen-Shannon divergence does not provide adequate separation to draw distinctions from subtly different distributions. We extend Jensen-Shannon divergence by reformulating it using alternate operators that provide different properties concerning robustness. Furthermore, we prove a number of important properties for this extension: the lower limits of its range, and its relationship to Shannon Entropy and Kullback-Leibler divergence. Finally, we propose a family of new kernels, based on Chisini mean Jensen-Shannon divergence, and demonstrate its utility in providing better SVM classification accuracy over RBF kernels for amino acid spectra. Because spectral methods capture phenomenon at subatomic levels, differences between complex compounds can often be subtle. While the impetus behind this work began with spectral data, the methods are generally applicable to domains where subtle differences are important.

[1]  J. N. Kapur,et al.  SOME NORMALIZED MEASURES OF DIRECTED DIVERGENCE , 1986 .

[2]  Lei Zhang,et al.  A new kernel discriminant analysis framework for electronic nose recognition. , 2014, Analytica chimica acta.

[3]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[4]  Yan Xu,et al.  Prediction of posttranslational modification sites from amino acid sequences with kernel methods. , 2014, Journal of theoretical biology.

[5]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[6]  Gunnar Rätsch,et al.  A New Discriminative Kernel from Probabilistic Models , 2001, Neural Computation.

[7]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[8]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[9]  Rong Jin,et al.  Multiple Kernel Learning for Visual Object Recognition: A Review , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Israel Schechter,et al.  Laser-induced breakdown spectroscopy (LIBS) : fundamentals and applications , 2006 .

[11]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[12]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[13]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[14]  Eric P. Xing,et al.  Nonextensive Information Theoretic Kernels on Measures , 2009, J. Mach. Learn. Res..

[15]  A. F. Cardona-Escobar,et al.  A methodology for the prediction of Embryophyta protein functions using mismatch kernels , 2015, 2015 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA).

[16]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Tony Jebara,et al.  Bhattacharyya Expected Likelihood Kernels , 2003, COLT.

[19]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[20]  Guangya Zhang,et al.  Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins , 2013, Comput. Biol. Chem..

[21]  Lu Bai,et al.  Information theoretic graph kernels , 2014 .

[22]  Reinhard Noll,et al.  Laser-Induced Breakdown Spectroscopy: Fundamentals and Applications , 2012 .

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Yves Moreau,et al.  Protein fold recognition using geometric kernel data fusion , 2014, Bioinform..

[25]  R. Kondor,et al.  Bhattacharyya and Expected Likelihood Kernels , 2003 .

[26]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[27]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[28]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[29]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[30]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[31]  Kenji Harada Kernel method for corrections to scaling. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Antoni B. Chan,et al.  A Family of Probabilistic Kernels Based on Information Divergence , 2004 .

[33]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[34]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[35]  S J Rehse,et al.  Laser-induced breakdown spectroscopy (LIBS): an overview of recent progress and future potential for biomedical applications , 2012, Journal of medical engineering & technology.