Out-of-Vocabulary Spoken Term Detection

Spoken term detection (STD) is a fundamental task for multimedia information retrieval. A major challenge faced by an STD system is the serious performance reduction when detecting out-of-vocabulary (OOV) terms. The difficulties arise not only from the absence of pronunciations for such terms in the system dictionaries, but from intrinsic uncertainty in pronunciations, significant diversity in term properties and a high degree of weakness in acoustic and language modelling. To tackle the OOV issue, we first applied the joint-multigram model to predict pronunciations for OOV terms in a stochastic way. Based on this, we propose a stochastic pronunciation model that considers all possible pronunciations for OOV terms so that the high pronunciation uncertainty is compensated for. Furthermore, to deal with the diversity in term properties, we propose a termdependent discriminative decision strategy, which employs discriminative models to integrate multiple informative factors and confidence measures into a classification probability, which gives rise to minimum decision cost. In addition, to address the weakness in acoustic and language modelling, we propose a direct posterior confidence measure which replaces the generative models with a discriminative model, such as a multi-layer perceptron (MLP), to obtain a robust confidence for OOV term detection. With these novel techniques, the STD performance on OOV terms was improved substantially and significantly in our experiments set on meeting speech data.

[1]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[2]  Thomas Hain,et al.  Dynamic HMM selection for continuous speech recognition , 1999, EUROSPEECH.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Mitchel Weintraub,et al.  LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Frédéric Bimbot,et al.  Variable-length sequence matching for phonetic transcription using joint multigrams , 1995, EUROSPEECH.

[6]  Steve Renals,et al.  Retrieval of broadcast news documents with the THISL system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Robert I. Damper,et al.  A recurrent network that learns to pronounce English text , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[9]  Bhuvana Ramabhadran,et al.  Acoustics-only based automatic phonetic baseform generation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  K. Beulen PRONUNCIATION MODELLING IN THE RWTH LARGE VOCABULARY SPEECH RECOGNIZER , 2008 .

[11]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[12]  M. A. Bush,et al.  Training and search algorithms for an interactive wordspotting system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Harald Höge,et al.  A new keyword spotting algorithm with pre-calculated optimal thresholds , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  James R. Glass,et al.  A multi-class approach for modelling out-of-vocabulary words , 2002, INTERSPEECH.

[15]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[16]  Stephanie Seneff,et al.  Phonological Parsing for Bi-directional Letter-to-Sound/Sound-to-Letter Generation , 1994, HLT.

[17]  R. E. Jones,et al.  EXPERIMENTS IN INFORMATION RETRIEVAL FROM SPOKEN DOCUMENTS , 1998 .

[18]  Hermann Ney,et al.  Investigations on joint-multigram models for grapheme-to-phoneme conversion , 2002, INTERSPEECH.

[19]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[20]  Michael J. Witbrock,et al.  Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents , 1997, DL '97.

[21]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[22]  Kari Torkkola An efficient way to learn English grapheme-to-phoneme rules automatically , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Walter Daelemans,et al.  IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[24]  Herbert Gish,et al.  Large vocabulary word scoring as a basis for transcription generation , 1995, EUROSPEECH.

[25]  Harriet J. Nock,et al.  Pronunciation modeling by sharing gaussian densities across phonetic models , 1999, EUROSPEECH.

[26]  Biing-Hwang Juang,et al.  Discriminative utterance verification for connected digits recognition , 1995, IEEE Trans. Speech Audio Process..

[27]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[28]  Herbert Gish,et al.  Approaches to topic identification on the switchboard corpus , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Siddika Parlak,et al.  Spoken term detection for Turkish Broadcast News , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Alfred Mertins,et al.  Automatic Speech Recognition and Intrinsic Speech Variation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[31]  Peter Schäuble,et al.  Speech Retrieval Based on Automatic Indexing , 1995, MIRO.

[32]  Jean-Pierre Martens,et al.  In search of better pronunciation models for speech recognition , 1999, Speech Commun..

[33]  Peng Yu,et al.  A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech , 2004, INTERSPEECH.

[34]  Laurent Miclet,et al.  Rejection of extraneous input in speech recognition applications, using multi-layer perceptrons and the trace of HMMs , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[35]  Murat Saraclar,et al.  Score Distribution Based Term Specific Thresholding for Spoken Term Detection , 2009, HLT-NAACL.

[36]  Xuedong Huang,et al.  Improvements on a trainable letter-to-sound converter , 1997, EUROSPEECH.

[37]  Ji Wu,et al.  Pronunciation variation modeling for Mandarin with accent , 2006, INTERSPEECH.

[38]  Howard C. Nusbaum,et al.  Pronounce : a program for pronunciation by analogy , 1991 .

[39]  Brian Kingsbury,et al.  Fast decoding for open vocabulary spoken term detection , 2009, HLT-NAACL.

[40]  Alexander H. Waibel,et al.  Reducing the OOV rate in broadcast news speech recognition , 1998, ICSLP.

[41]  Jia Liu,et al.  Fusing multiple systems into a compact lattice index for chinese spoken term detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Helmer Strik,et al.  Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation , 1999, Speech Commun..

[43]  Peng Yu,et al.  Fast two-stage vocabulary independent search in spontaneous speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[44]  Gustavo Hernández Ábrego Confidence measures for speech recognition and utterance verification , 2000 .

[45]  Shi-wook Lee,et al.  Two-stage vocabulary-free spoken document retrieval - subword identification and re-recognition of the identified sections , 2006, Interspeech.

[46]  Wayne H. Ward,et al.  A senone based confidence measure for speech recognition , 1997, EUROSPEECH.

[47]  J. Xu,et al.  Audio Indexing of Arabic broadcast news , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[48]  Dragutin Petkovic,et al.  Phonetic confusion matrix based spoken document retrieval , 2000, SIGIR '00.

[49]  Beth Logan,et al.  Approaches to reduce the effects of OOV queries on indexed spoken audio , 2005, IEEE Transactions on Multimedia.

[50]  Chalapathy Neti,et al.  Word-based confidence measures as a guide for stack search in speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[51]  R. A. Sharman,et al.  A bi-directional model of English pronunciation , 1991, EUROSPEECH.

[52]  Sridha Sridharan,et al.  Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[53]  Gunnar Evermann,et al.  Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[54]  Søren Riis,et al.  Self-organizing letter code-book for text-to-phoneme neural network model , 2000, INTERSPEECH.

[55]  Katsuhito Sudoh,et al.  Discriminative named entity recognition of speech data using speech recognition confidence , 2006, INTERSPEECH.

[56]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[57]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[58]  Howard D. Wactlar,et al.  INFORMEDIATM: NEWS-ON-DEMAND EXPERIMENTS IN SPEECH RECOGNITION , 1998 .

[59]  Biing-Hwang Juang,et al.  Robust utterance verification for connected digits recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[60]  Karen Spärck Jones,et al.  Open-vocabulary speech indexing for voice and video mail retrieval , 1997, MULTIMEDIA '96.

[61]  Susan M. Mniszewski,et al.  A Default Hierarchy for Pronouncing English , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[63]  Kenney Ng Towards robust methods for spoken document retrieval , 1998, ICSLP.

[64]  Hermann Ney,et al.  A comparison of word graph and n-best list based confidence measures , 1999, EUROSPEECH.

[65]  Lin Lawrence Chase,et al.  Word and acoustic confidence annotation for large vocabulary speech recognition , 1997, EUROSPEECH.

[66]  Hervé Bourlard,et al.  Iterative Posterior-Based Keyword Spotting Without Filler Models , 1999 .

[67]  Stanley Boykin,et al.  Audio Hot Spotting and Retrieval using Multiple Features , 2004, HLT-NAACL 2004.

[68]  R. Damper,et al.  Pronunciation by Analogy: Impact of Implementational Choices on Performance , 1997 .

[69]  Andreas Stolcke,et al.  The SRI/OGI 2006 spoken term detection system , 2007, INTERSPEECH.

[70]  Murat Saraclar,et al.  Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[71]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[72]  Jay G. Wilpon,et al.  A two pass classifier for utterance rejection in keyword spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[73]  Ryan Thomas,et al.  Grapheme to phoneme conversion and dictionary verification using graphonemes , 2003, INTERSPEECH.

[74]  Peng Yu,et al.  Vocabulary-independent indexing of spontaneous speech , 2005, IEEE Transactions on Speech and Audio Processing.

[75]  Elmar Nöth,et al.  Comparison of two tree-structured approaches for grapheme-to-phoneme conversion , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[76]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[77]  Yoshinori Sagisaka,et al.  Automatic generation of a pronunciation dictionary based on a pronunciation network , 1997, EUROSPEECH.

[78]  Rong Zhang,et al.  Word level confidence annotation using combinations of features , 2001, INTERSPEECH.

[79]  Herbert Gish,et al.  Evaluation of word confidence for speech recognition systems , 1999, Comput. Speech Lang..

[80]  Daniel Schneider,et al.  Efficient subword lattice retrieval for German spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[81]  Walter Daelemans,et al.  Phoneme-to-grapheme conversion for out-of-vocabulary words in large vocabulary speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[82]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.

[83]  Björn W. Schuller,et al.  Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[84]  Bhuvana Ramabhadran,et al.  Multilingual Spoken Term Detection: Finding and Testing New Pronunciations , 2008 .

[85]  Bhuvana Ramabhadran,et al.  Phonetic query expansion for spoken document retrieval , 2008, INTERSPEECH.

[86]  Stephen J. Cox,et al.  Confidence measures for the SWITCHBOARD database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[87]  Bhuvana Ramabhadran,et al.  Effect of pronunciations on OOV queries in spoken term detection , 2009 .

[88]  Hui Lin,et al.  Spoken keyword spotting via multi-lattice alignment , 2008, INTERSPEECH.

[89]  Mark Bedworth,et al.  NETspeak — A re-implementation of NETtalk , 1987 .

[90]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[91]  Satoshi Nakamura,et al.  Speech recognition of foreign out-of-vocabulary words using a hierarchical language model , 2006, Interspeech.

[92]  Peter Regel-Brietzmann,et al.  Word graph rescoring using confidence measures , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[93]  Simon King,et al.  Term-dependent confidence for out-of-vocabulary term detection , 2009, INTERSPEECH.

[94]  Simon King,et al.  Stochastic pronunciation modelling for spoken term detection , 2009, INTERSPEECH.

[95]  Karen Spärck Jones,et al.  Acoustic indexing for multimedia retrieval and browsing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[96]  Hervé Bourlard,et al.  Optimizing recognition and rejection performance in wordspotting systems , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[97]  Hagai Aronowitz Online vocabulary adaptation using contextual information and information retrieval , 2008, INTERSPEECH.

[98]  Peter Schäuble,et al.  New techniques for open-vocabulary spoken document retrieval , 1998, SIGIR '98.

[99]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[100]  Jürgen Schmidhuber,et al.  An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.

[101]  Beth Logan,et al.  Confusion-based query expansion for OOV words in spoken document retrieval , 2002, INTERSPEECH.

[102]  James F. Allen,et al.  Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion , 2002, INTERSPEECH.

[103]  R. I. Damper,et al.  Stochastic phonographic transduction for English , 1996, Comput. Speech Lang..

[104]  Jia Liu,et al.  Addressing the out-of-vocabulary problem for large-scale Chinese spoken term detection , 2008, INTERSPEECH.

[105]  Howard Carter,et al.  A PRELIMINARY INVESTIGATION , 2010 .

[106]  Douglas D. O'Shaughnessy,et al.  Accurate keyword spotting using strictly lexical fillers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[107]  Sheryl R. Young,et al.  Detecting misrecognitions and out-of-vocabulary words , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[108]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[109]  Michael J. Swain,et al.  SpeechBot: a Speech Recognition based Audio Indexing System for the Web , 2000, RIAO.

[110]  Hervé Bourlard,et al.  Improving posterior based confidence measures in hybrid HMM/ANN speech recognition systems , 1998, ICSLP.

[111]  Dong Wang,et al.  A comparison of grapheme and phoneme-based units for Spanish spoken term detection , 2008, Speech Commun..

[112]  Robert I. Damper,et al.  A multistrategy approach to improving pronunciation by analogy , 2000, CL.

[113]  D. Watson Death Sentence: The Decay of Public Language , 2003 .

[114]  Paul Taylor,et al.  Hidden Markov models for grapheme to phoneme conversion , 2005, INTERSPEECH.

[115]  Walter Daelemans,et al.  Forgetting Exceptions is Harmful in Language Learning , 1998, Machine Learning.

[116]  James F. Allen,et al.  Bi-directional conversion between graphemes and phonemes using a joint N-gram model , 2001, SSW.

[117]  Mikko Kurimo,et al.  Using latent semantic indexing for morph-based spoken document retrieval , 2006, INTERSPEECH.

[118]  David Grangier,et al.  Machine Learning for Information Retrieval , 2008 .

[119]  J. C. Speech Hybrid word-subword decoding for spoken term detection , 2008 .

[120]  Juha Häkkinen,et al.  Assessing text-to-phoneme mapping strategies in speaker independent isolated word recognition , 2003, Speech Commun..

[121]  Salim Roukos,et al.  Experimental Results in Audio Indexing , 1997 .

[122]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[123]  Steve Young,et al.  The HTK book , 1995 .

[124]  Robert I. Damper,et al.  Novel-word pronunciation within a text-to-speech system , 1990, SSW.

[125]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[126]  Lukás Burget,et al.  The 2005 AMI System for the Transcription of Speech in Meetings , 2005, MLMI.

[127]  Timothy J. Hazen,et al.  Word and phone level acoustic confidence scoring , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[128]  Jonathan Mamou,et al.  Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search , 2008 .

[129]  Richard Rose,et al.  Discriminant wordspotting techniques for rejecting non-vocabulary utterances in unconstrained speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[130]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[131]  Dietrich Klakow,et al.  OOV-detection in large vocabulary system using automatically defined word-fragments as fillers , 1999, EUROSPEECH.

[132]  Simon King,et al.  A posterior approach for microphone array based speech recognition , 2008, INTERSPEECH.

[133]  Aaron E. Rosenberg,et al.  An investigation of the use of dynamic time warping for word spotting and connected speech recognition , 1980, ICASSP.

[134]  Beth Logan,et al.  Speechbot: an experimental speech-based search engine for multimedia content on the web , 2002, IEEE Trans. Multim..

[135]  Karen Spärck Jones,et al.  Retrieving spoken documents by combining multiple index sources , 1996, SIGIR '96.

[136]  Walter Daelemans,et al.  Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion , 1996 .

[137]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[138]  Timothy J. Hazen,et al.  A comparison of query-by-example methods for spoken term detection , 2009, INTERSPEECH.

[139]  Herbert Gish,et al.  Improved estimation, evaluation and applications of confidence measures for speech recognition , 1997, EUROSPEECH.

[140]  William J. Byrne,et al.  Stochastic pronunciation modelling from hand-labelled phonetic corpora , 1999, Speech Commun..

[141]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[142]  Richard Rose,et al.  A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[143]  Louis ten Bosch,et al.  Acoustic Scores and Symbolic Mismatch Penalties in Phone Lattices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[144]  Jerome R. Bellegarda Unsupervised, language-independent grapheme-to-phoneme conversion by latent analogy , 2005, Speech Commun..

[145]  Lukás Burget,et al.  Indexing and Search Methods for Spoken Documents , 2006, TSD.

[146]  Raymond J. Mooney,et al.  Symbolic and Neural Learning Algorithms: An Experimental Comparison , 1991, Machine Learning.

[147]  Karen Spärck Jones,et al.  Robust talker-independent audio document retrieval , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[148]  Hervé Bourlard,et al.  A new approach towards keyword spotting , 1993, EUROSPEECH.

[149]  James R. Glass,et al.  Modeling out-of-vocabulary words for robust speech recognition , 2000, INTERSPEECH.

[150]  Paul Dalsgaard,et al.  A self-learning approach to transcription of danish proper names , 1994, ICSLP.

[151]  Thomas Hain,et al.  IMPLICIT PRONUNCIATION MODELLING IN ASR , 2002 .

[152]  Stanley F. Chen,et al.  Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.

[153]  Robert I. Damper,et al.  A comparison of letter-to-sound conversion techniques for English text-to-speech synthesis , 1998 .

[154]  Aren Jansen,et al.  Point Process Models for Spotting Keywords in Continuous Speech , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[155]  Thomas G. Dietterich,et al.  A Comparison of ID3 and Backpropagation for English Text-To-Speech Mapping , 2004, Machine Learning.

[156]  Josef V. Psutka,et al.  Comparison of keyword spotting methods for searching in speech , 2006, INTERSPEECH.

[157]  Chin-Hui Lee,et al.  A study on word detector design and knowledge-based pruning and rescoring , 2007, INTERSPEECH.

[158]  Simon King,et al.  Growing bottleneck features for tandem ASR , 2008, INTERSPEECH.

[159]  Juha Häkkinen,et al.  Decision tree based text-to-phoneme mapping for speech recognition , 2000, INTERSPEECH.

[160]  Dong Wang,et al.  A comparison of phone and grapheme-based spoken term detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[161]  James R. Glass,et al.  Heterogeneous lexical units for automatic speech recognition: preliminary investigations , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[162]  Victor Zue,et al.  A segment-based wordspotter using phonetic filler models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[163]  Lukás Burget,et al.  Spoken Term Detection System Based on Combination of LVCSR and Phonetic Search , 2007, MLMI.

[164]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[165]  W. Russell,et al.  Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[166]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[167]  Sridha Sridharan,et al.  Spoken term detection using fast phonetic decoding , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[168]  Frank Seide,et al.  Word-lattice based spoken-document indexing with standard text indexers , 2008, 2008 IEEE Spoken Language Technology Workshop.

[169]  Lucian Galescu Recognition of out-of-vocabulary words with sub-lexical language models , 2003, INTERSPEECH.

[170]  Lukás Burget,et al.  Phoneme Based Acoustics Keyword Spotting in Informal Continuous Speech , 2005, TSD.

[171]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[172]  Walter Daelemans,et al.  Rapid Development of NLP Modules with Memory-based Learning , 1998 .

[173]  Beth Logan,et al.  An experimental study of an audio indexing system for the web , 2000, INTERSPEECH.

[174]  Lukás Burget,et al.  Sub-word modeling of out of vocabulary words in spoken term detection , 2008, 2008 IEEE Spoken Language Technology Workshop.

[175]  Alexander H. Waibel,et al.  Dictionary learning for spontaneous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[176]  Mark A. Clements,et al.  Phonetic Searching vs. LVCSR: How to Find What You Really Want in Audio Archives , 2002, Int. J. Speech Technol..

[177]  Hermann Ney,et al.  Multigram-based grapheme-to-phoneme conversion for LVCSR , 2003, INTERSPEECH.

[178]  Bhuvana Ramabhadran,et al.  Effect of pronounciations on OOV queries in spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[179]  Antal van den Bosch,et al.  Automatic phonetic transcription of words based on sparse data , 1997 .

[180]  Thomas Schaaf,et al.  Confidence measures for spontaneous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[181]  Sridha Sridharan,et al.  Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[182]  Beth Logan,et al.  Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio , 2002 .

[183]  Cyril Allauzen,et al.  General Indexation of Weighted Automata - Application to Spoken Utterance Retrieval , 2004, HLT-NAACL 2004.

[184]  Thomas Schaaf,et al.  Estimating confidence using word lattices , 1997, EUROSPEECH.

[185]  Brian Kingsbury,et al.  Discriminative graph training for ultra-fast low-footprint speech indexing , 2008, INTERSPEECH.

[186]  Walter Daelemans,et al.  Data-Oriented Methods for Grapheme-to-Phoneme Conversion , 1993, EACL.

[187]  Chin-Hui Lee,et al.  Utterance verification of keyword strings using word-based minimum verification error (WB-MVE) training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[188]  Larry Gillick,et al.  A probabilistic approach to confidence estimation and evaluation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[189]  Salim Roukos,et al.  A multistage algorithm for spotting new words in speech , 2002, IEEE Trans. Speech Audio Process..

[190]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[191]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[192]  Jia Liu,et al.  A study of lattice-based spoken term detection for Chinese spontaneous speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[193]  Alon Efrat,et al.  Advances in phonetic word spotting , 2001, CIKM '01.

[194]  Hui Lin,et al.  Improving multi-lattice alignment based spoken keyword spotting , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[195]  Karen Spärck Jones,et al.  Effects of out of vocabulary words in spoken document retrieval (poster session) , 2000, SIGIR '00.

[196]  David W. Shipman,et al.  Letter‐to‐phoneme rules: A semi‐automatic discovery procedure , 1982 .

[197]  Anthony J. Vitale,et al.  Algorithms for Grapheme-Phoneme Translation for English and French: Applications for Database Searches and Speech Synthesis , 1997, CL.

[198]  Ashish Verma,et al.  Keyword Search using Modified Minimum Edit Distance Measure , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[199]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.

[200]  R. Wohlford,et al.  Keyword recognition using template concatenation , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[201]  Florian Metze,et al.  The TUB 2006 Spoken Term Detection System , 2006 .

[202]  Bin Ma,et al.  A phonotactic-semantic paradigm for automatic spoken document classification , 2005, SIGIR '05.

[203]  S. Roukos,et al.  New word detection in audio-indexing , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[204]  Helen M. Meng,et al.  A hierarchical lexical representation for bi-directional spelling-to-pronunciation/pronunciation-to-spelling generation , 2001, Speech Commun..

[205]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[206]  Andreas Stolcke,et al.  Open-vocabulary spoken term detection using graphone-based hybrid recognition systems , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[207]  Thomas Hain,et al.  Hidden Model Sequence Models for Automatic Speech Recognition , 2007 .

[208]  Jonathan G. Fiscus,et al.  Results of the 2006 Spoken Term Detection Evaluation , 2006 .

[209]  William I. Hallahan DECtalk Software: Text-to-Speech Technology and Implementation , 1995, Digit. Tech. J..

[210]  Herbert Gish,et al.  Secondary processing using speech segments for an HMM word spotting system , 1992, ICSLP.

[211]  Herbert Gish,et al.  Phonetic-based word spotter: various configurations and application to event spotting , 1993, EUROSPEECH.

[212]  Lou Boves,et al.  Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms , 2000, Speech Commun..

[213]  Hong Kook Kim,et al.  Acoustic Model Adaptation Based on Pronunciation Variability Analysis for Non-Native Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[214]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[215]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[216]  Delphine Charlet,et al.  Using textual information from LVCSR transcripts for phonetic-based spoken term detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[217]  Simon King,et al.  Multisyn: Open-domain unit selection for the Festival speech synthesis system , 2007, Speech Commun..

[218]  Peng Yu,et al.  Vocabulary-independent search in spontaneous speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[219]  Gary Geunbae Lee,et al.  Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS , 1998, COLING-ACL.

[220]  Ralf Schlüter,et al.  Using word probabilities as confidence measures , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[221]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[222]  Michael Cohen,et al.  A phone-dependent confidence measure for utterance rejection , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[223]  Alan W. Black,et al.  Letter to sound rules for accented lexicon compression , 1998, ICSLP.

[224]  Alex Acero,et al.  Position Specific Posterior Lattices for Indexing Speech , 2005, ACL.

[225]  Olivier Siohan,et al.  Fast vocabulary-independent audio search using path-based graph indexing , 2005, INTERSPEECH.

[226]  José B. Mariño,et al.  Out-of-vocabulary word modelling and rejection for keyword spotting , 1993, EUROSPEECH.

[227]  S. R. Mahadeva Prasanna,et al.  Fast Approximate Spoken Term Detection from Sequence of Phonemes , 2008, SIGIR 2008.

[228]  James R. Glass,et al.  Learning units for domain-independent out-of- vocabulary word modelling , 2001, INTERSPEECH.

[229]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[230]  David A. James,et al.  A system for unrestricted topic retrieval from radio news broadcasts , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[231]  Paul C. Bagshaw Phonemic transcription by analogy in text-to-speech synthesis: Novel word pronunciation and lexicon compression , 1998, Comput. Speech Lang..

[232]  Samy Bengio,et al.  Discriminative keyword spotting , 2009, Speech Commun..

[233]  Zeév Rivlin A confidence measure for acoustic likelihood scores , 1995, EUROSPEECH.

[234]  Chin-Hui Lee,et al.  Application of hidden Markov models for recognition of a limited set of words in unconstrained speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[235]  Steve J. Young,et al.  A fast lattice-based approach to vocabulary independent wordspotting , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[236]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[237]  Biing-Hwang Juang,et al.  A training procedure for verifying string hypotheses in continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[238]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[239]  Lukás Burget,et al.  The AMI Meeting Transcription System: Progress and Performance , 2006, MLMI.

[240]  Rafid A. Sukkar,et al.  Correcting recognition errors via discriminative utterance verification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[241]  R. Christiansen,et al.  Detecting and locating key words in continuous speech using linear predictive coding , 1977 .

[242]  Bernhard Rüber,et al.  Obtaining confidence measures from sentence probabilities , 1997, EUROSPEECH.

[243]  Kenji Iwata,et al.  Robust spoken term detection using combination of phone-based and word-based recognition , 2008, INTERSPEECH.

[244]  Sridha Sridharan,et al.  A phonetic search approach to the 2006 NIST spoken term detection evaluation , 2007, INTERSPEECH.

[245]  Rafid A. Sukkar,et al.  Subword-based minimum verification error (SB-MVE) training for task independent utterance verification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[246]  Maria Wolters,et al.  A Diphone{based Text-to-speech System for Scottish Gaelic , 1997 .

[247]  Nazife Baykal,et al.  An experimental comparison of symbolic and neural learning algorithms , 1998, 1998 Second International Conference. Knowledge-Based Intelligent Electronic Systems. Proceedings KES'98 (Cat. No.98EX111).

[248]  Mark A. Randolph,et al.  An approach to automatic phonetic baseform generation based on Bayesian networks , 2001, INTERSPEECH.

[249]  John H. L. Hansen,et al.  Spoken Proper Name Retrieval in Audio Streams for Limited-Resource Languages Via Lattice Based Search Using Hybrid Representations , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[250]  Dong Wang,et al.  Posterior-based confidence measures for spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[251]  Victor Zue,et al.  Reversible letter-to-sound/sound-to-letter generation based on parsing word morpology , 1993, Speech Commun..

[252]  R. Glushko The Organization and Activation of Orthographic Knowledge in Reading Aloud. , 1979 .

[253]  Steve Renals,et al.  Confidence measures from local posterior probability estimates , 1999, Comput. Speech Lang..

[254]  Walter Daelemans,et al.  Tabtalk: reusability in data-oriented grapheme-to-phoneme conversion , 1993, EUROSPEECH.

[255]  Lukás Burget,et al.  The AMI System for the Transcription of Speech in Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[256]  Aruna Bayya Rejection in speech recognition systems with limited training , 1998, ICSLP.

[257]  Karen Spärck Jones,et al.  Talker-independent keyword spotting for information retrieval , 1995, EUROSPEECH.

[258]  Mitch Weintraub,et al.  Neural-network based measures of confidence for word recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[259]  John H. L. Hansen,et al.  A robust fusion method for multilingual spoken document retrieval systems employing tiered resources , 2006, INTERSPEECH.

[260]  Kenney Ng,et al.  Subword-based approaches for spoken document retrieval , 2000, Speech Commun..

[261]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[262]  Samy Bengio,et al.  Posterior based keyword spotting with a priori thresholds , 2006, INTERSPEECH.

[263]  Reinhard Kneser,et al.  Designing very compact decision trees for grapheme-to-phoneme transcription , 2001, INTERSPEECH.