System combination and score normalization for spoken term detection

Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system.

[1]  Bhuvana Ramabhadran,et al.  Effect of pronounciations on OOV queries in spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Brian Kingsbury,et al.  Exploiting diversity for spoken term detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Fernando Pereira,et al.  Weighted Automata in Text and Speech Processing , 2005, ArXiv.

[4]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[5]  Tara N. Sainath,et al.  Auto-encoder bottleneck features using deep belief networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[7]  Mark J. F. Gales,et al.  The efficient incorporation of MLP features into automatic speech recognition systems , 2011, Comput. Speech Lang..

[8]  Georg Heigold,et al.  Posterior-Scaled MPE: Novel Discriminative Training Criteria , 2012, INTERSPEECH.

[9]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[10]  Hermann Ney,et al.  On the Estimation of Discount Parameters for Language Model Smoothing , 2011, INTERSPEECH.

[11]  Brian Kingsbury,et al.  Fast decoding for open vocabulary spoken term detection , 2009, HLT-NAACL.

[12]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[13]  Xiaodong Cui,et al.  A high-performance Cantonese keyword search system , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Andreas Stolcke,et al.  The SRI/OGI 2006 spoken term detection system , 2007, INTERSPEECH.

[15]  Javed A. Aslam,et al.  Relevance score normalization for metasearch , 2001, CIKM '01.

[16]  Lukás Burget,et al.  Sub-word modeling of out of vocabulary words in spoken term detection , 2008, 2008 IEEE Spoken Language Technology Workshop.

[17]  Jonathan Mamou,et al.  Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search , 2008 .

[18]  Brian Kingsbury,et al.  The IBM Attila speech recognition toolkit , 2010, 2010 IEEE Spoken Language Technology Workshop.

[19]  Shengli Wu,et al.  Evaluating Score Normalization Methods in Data Fusion , 2006, AIRS.

[20]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[21]  Yong Wang,et al.  Pace Regression , 1999 .

[22]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[23]  Jonathan G. Fiscus,et al.  Results of the 2006 Spoken Term Detection Evaluation , 2006 .

[24]  Hermann Ney,et al.  Joining advantages of word-conditioned and token-passing decoding , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Cyril Allauzen,et al.  General Indexation of Weighted Automata - Application to Spoken Utterance Retrieval , 2004, HLT-NAACL 2004.

[26]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[27]  Shengli Wu,et al.  Data Fusion in Information Retrieval , 2012, Adaptation, Learning, and Optimization.

[28]  John R. Hershey,et al.  Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Tara N. Sainath,et al.  Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization , 2012, INTERSPEECH.

[30]  Richard M. Schwartz,et al.  White Listing and Score Normalization for Keyword Spotting of Noisy Speech , 2012, INTERSPEECH.