Morphological Segmentation for Keyword Spotting

We explore the impact of morphological segmentation on keyword spotting (KWS). Despite potential benefits, stateof-the-art KWS systems do not use morphological information. In this paper, we augment a state-of-the-art KWS system with sub-word units derived from supervised and unsupervised morphological segmentations, and compare with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological, phonetic and syllabic segmentations, we demonstrate substantial performance gains.

[1]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[2]  Owen Kimball,et al.  Detection of unseen words in conversational Mandarin , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Hitoshi Isahara,et al.  A Conditional Random Field Framework for Thai Morphological Analysis , 2006, LREC.

[4]  Noah A. Smith,et al.  Knowledge-Rich Morphological Priors for Bayesian Language Models , 2013, NAACL.

[5]  Çağrı Çöltekin,et al.  A Freely Available Morphological Analyzer for Turkish , 2010, LREC.

[6]  Jan Cernocký,et al.  BUT 2014 Babel system: analysis of adaptation in NN based systems , 2014, INTERSPEECH.

[7]  Richard M. Schwartz,et al.  Score normalization and system combination for improved keyword spotting , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[8]  Mathias Creutz,et al.  INDUCING THE MORPHOLOGICAL LEXICON OF A NATURAL LANGUAGE FROM UNANNOTATED TEXT , 2005 .

[9]  Noah A. Smith,et al.  Translating into Morphologically Rich Languages with Synthetic Phrases , 2013, EMNLP.

[10]  Jonathan G. Fiscus,et al.  Results of the 2006 Spoken Term Detection Evaluation , 2006 .

[11]  Regina Barzilay,et al.  Unsupervised Morphology Rivals Supervised Morphology for Arabic MT , 2012, ACL.

[12]  Nizar Habash,et al.  Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation , 2008, ACL.

[13]  Chris Dyer,et al.  Using a maximum entropy model to build segmentation lattices for MT , 2009, NAACL.

[14]  Mikko Kurimo,et al.  Supervised Morphological Segmentation in a Low-Resource Learning Setting using Conditional Random Fields , 2013, CoNLL.

[15]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.