Multimedia Keyword Spotting (MKWS) Using Training And Template Based Techniques

The amount of multimedia information available is increasing due to the development of multimedia applications. As a consequence, content based searching and retrieval tools are needed for easy access to multimedia files. This paper presents a keyword spotter for searching a spoken word in a multimedia file using two techniques, Hidden Markov Model (HMM) and Dynamic Time Warping (DTW). To be useful, such a keyword spotting system has to be speaker-independent. Moreover, it has to be able to detect a word from a large vocabulary. This directly implies the use of a phonemic representation of the word which is achieved through HMM. Although in some scenarios, HMM approach is time and resource exhausting and hence DTW is used. MKWS can be applied to media files which can help in their automatic analysis and traversal. Keywords—DTW, HMM, keyword spotting, MKWS,

[1]  M. S. Barakat,et al.  Detecting offensive user video blogs: An adaptive keyword spotting approach , 2012, 2012 International Conference on Audio, Language and Image Processing.

[2]  Stan Z. Li,et al.  Content-based Classification and Retrieval of Audio Using the Nearest Feature Line Method , 2000 .

[3]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[4]  Aren Jansen,et al.  Point Process Models for Spotting Keywords in Continuous Speech , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[6]  James R. Glass,et al.  Unsupervised Word Acquisition from Speech using Pattern Discovery , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Christian Ritz,et al.  An Improved Template-Based Approach to Keyword Spotting Applied to the Spoken Content of User Generated Video Blogs , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[8]  Martha Larson,et al.  Enhanced Multimedia Content Access and Exploitation Using Semantic Speech Retrieval , 2009, 2009 IEEE International Conference on Semantic Computing.

[9]  尚弘 島影 National Institute of Standards and Technologyにおける超伝導研究及び生活 , 2001 .

[10]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[11]  Huamin Feng,et al.  An audio classification and speech recognition system for video content analysis , 2011, 2011 International Conference on Multimedia Technology.

[12]  Sadaoki Furui,et al.  Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance , 2008, Comput. Speech Lang..