A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

Over the last years significant efforts have been made to develop kernels that can be applied to sequence data such as DNA, text, speech, video and images. The Fisher Kernel and similar variants have been suggested as good ways to combine an underlying generative model in the feature space and discriminant classifiers such as SVM's. In this paper we suggest an alternative procedure to the Fisher kernel for systematically finding kernel functions that naturally handle variable length sequence data in multimedia domains. In particular for domains such as speech and images we explore the use of kernel functions that take full advantage of well known probabilistic models such as Gaussian Mixtures and single full covariance Gaussian models. We derive a kernel distance based on the Kullback-Leibler (KL) divergence between generative models. In effect our approach combines the best of both generative and discriminative methods and replaces the standard SVM kernels. We perform experiments on speaker identification/verification and image classification tasks and show that these new kernels have the best performance in speaker verification and mostly outperform the Fisher kernel based SVM's and the generative classifiers in speaker identification and image classification.

[1]  F. Bimbot,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[4]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[5]  Nuno Vasconcelos,et al.  A unifying view of image similarity , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  Mahesan Niranjan,et al.  Data-dependent kernels in svm classification of speech patterns , 2000, INTERSPEECH.

[7]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[8]  Pedro J. Moreno,et al.  Using the Fisher kernel method for Web audio classification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Ke Chen,et al.  Towards better making a decision in speaker verification , 2003, Pattern Recognit..