Discriminative Keyword Selection Using Support Vector Machines

Many tasks in speech processing involve classification of long term characteristics of a speech segment such as language, speaker, dialect, or topic. A natural technique for determining these characteristics is to first convert the input speech into a sequence of tokens such as words, phones, etc. From these tokens, we can then look for distinctive sequences, keywords, that characterize the speech. In many applications, a set of distinctive keywords may not be known a priori. In this case, an automatic method of building up keywords from short context units such as phones is desirable. We propose a method for the construction of keywords based upon Support Vector Machines. We cast the problem of keyword selection as a feature selection problem for n-grams of phones. We propose an alternating filter-wrapper method that builds successively longer keywords. Application of this method to language recognition and topic recognition tasks shows that the technique produces interesting and significant qualitative and quantitative results.

[1]  Bin Ma,et al.  A phonotactic-semantic paradigm for automatic spoken document classification , 2005, SIGIR '05.

[2]  Mehryar Mohri,et al.  Rational Kernels , 2002, NIPS.

[3]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Jean-Luc Gauvain,et al.  Discriminative Classifiers for Language Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Herbert Gish,et al.  Discriminatively trained Language Models using Support Vector Machines for Language Identification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[6]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.

[7]  William M. Campbell,et al.  Advanced Language Recognition using Cepstra and Phonotactics: MITLL System Performance on the NIST 2005 Language Recognition Evaluation , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[8]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[9]  James R. Glass,et al.  Open-Vocabulary Spoken Utterance Retrieval using Confusion Networks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[11]  Steve Young,et al.  The HTK book , 1995 .

[12]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[13]  William M. Campbell,et al.  Language Recognition with Word Lattices and Support Vector Machines , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[15]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[16]  Konrad Rieck,et al.  Language models for detection of unknown attacks in network traffic , 2006, Journal in Computer Virology.