A New Discriminative Kernel from Probabilistic Models

Recently, Jaakkola and Haussler (1999) proposed a method for constructing kernel functions from probabilistic models. Their so-called Fisher kernel has been combined with discriminative classifiers such as support vector machines and applied successfully in, for example, DNA and protein analysis. Whereas the Fisher kernel is calculated from the marginal log-likelihood, we propose the TOP kernel derived from tangent vectors of posterior log-odds. Furthermore, we develop a theoretical framework on feature extractors from probabilistic models and use it for analyzing the TOP kernel. In experiments, our new discriminative TOP kernel compares favorably to the Fisher kernel.

[1]  D. Cox,et al.  Asymptotic techniques for use in statistics , 1989 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[4]  Trevor F. Cox,et al.  Discriminant analysis using non-metric multidimensional scaling , 1993, Pattern Recognit..

[5]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[6]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[7]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[8]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[10]  Dan Geiger,et al.  Graphical Models and Exponential Families , 1998, UAI.

[11]  Klaus Obermayer,et al.  Classi cation on Pairwise Proximity , 2007 .

[12]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[13]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[14]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[15]  Mark J. F. Gales,et al.  Speech Recognition using SVMs , 2001, NIPS.

[16]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.