论文信息 - Linguistically-motivated sub-word modeling with applications to speech recognition

Linguistically-motivated sub-word modeling with applications to speech recognition

Despite the proliferation of speech-enabled applications and devices, speech-driven human-machine interaction still faces several challenges. One of theses issues is the new word or the out-of-vocabulary (OOV) problem, which occurs when the underlying automatic speech recognizer (ASR) encounters a word it does not "know". With ASR being deployed in constantly evolving domains such as restaurant ratings, or music querying, as well as on handheld devices, the new word problem continues to arise. This thesis is concerned with the OOV problem, and in particular with the process of modeling and learning the lexical properties of an OOV word through a linguistically-motivated sub-syllabic model. The linguistic model is designed using a context-free grammar which describes the sub-syllabic structure of English words, and encapsulates phonotactic and phonological constraints. The context-free grammar is supported by a probability model, which captures the statistics of the parses generated by the grammar and encodes spatio-temporal context. The two main outcomes of the grammar design are: (1) sub-word units, which encode pronunciation information, and can be viewed as clusters of phonemes; and (2) a high-quality alignment between graphemic and sub-word units, which results in hybrid entities denoted as spellnemes. The spellneme units are used in the design of a statistical bi-directional letter-to-sound (L2S) model, which plays a significant role in automatically learning the spelling and pronunciation of a new word. The sub-word units and the L2S model are assessed on the task of automatic lexicon generation. In a first set of experiments, knowledge of the spelling of the lexicon is assumed. It is shown that the phonemic pronunciations associated with the lexicon can be successfully learned using the L2S model as well as a sub-word recognizer. In a second set of experiments, the assumption of perfect spelling knowledge is relaxed, and an iterative and unsupervised algorithm, denoted as Turbo-style, makes use of spoken instances of both spellings and words to learn the lexical entries in a dictionary. Sub-word speech recognition is also embedded in a parallel fashion as a back-off mechanism for a word recognizer. The resulting hybrid model is evaluated in a lexical access application, whereby a word recognizer first attempts to recognize an isolated word. Upon failure of the word recognizer, the sub-word recognizer is manually triggered. Preliminary results show that such a hybrid set-up outperforms a large-vocabulary recognizer. Finally, the sub-word units are embedded in a flat hybrid 00V model for continuous ASR. The hybrid ASR is deployed as a front-end to a song retrieval application, which is queried via spoken lyrics. Vocabulary compression and open-ended query recognition are achieved by designing a hybrid ASR. The performance of the front-end recognition system is reported in terms of sentence, word, and sub-word error rates. The hybrid ASR is shown to outperform a word-only system over a range of out-of-vocabulary rates (1%-50%). The retrieval performance is thoroughly assessed as a function of ASR N-best size, language model order, and the index size. Moreover, it is shown that the sub-words outperform alternative linguistically-motivated sub-lexical units such as phonemes. Finally, it is observed that a dramatic vocabulary compression - by more than a factor of 10 - is accompanied by a minor loss in song retrieval performance. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Ghinwa F. Choueiter

[1] Mari Ostendorf,et al. Moving beyond the 'beads-on-a-string' model of speech , 1999 .

[2] A. Asadi,et al. Automatic detection and modeling of new words in a large-vocabulary continuous speech recognition system , 1992 .

[3] James R. Glass,et al. Learning units for domain-independent out-of- vocabulary word modelling , 2001, INTERSPEECH.

[4] Murat Saraclar,et al. Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Mona Singh,et al. Experiments in spoken queries for document retrieval , 1997, EUROSPEECH.

[6] Yu Shi,et al. A system for spoken query information retrieval on mobile devices , 2002, IEEE Trans. Speech Audio Process..

[7] Dong Yu,et al. An introduction to voice search , 2008, IEEE Signal Processing Magazine.

[8] I. Lee Hetherington,et al. An efficient implementation of phonological rules using finite-state transducers , 2001, INTERSPEECH.

[9] Victor Zue,et al. JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[10] Josef G. Bauer,et al. Accurate recognition of city names with spelling as a fall back strategy , 1999, EUROSPEECH.

[11] I. Lee Hetherington. A characterization of the problem of new, out-of-vocabulary words in continuous-speech recognition and understanding , 1995 .

[12] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13] Stephanie Seneff,et al. Developing City Name Acquisition Strategies in Spoken Dialogue Systems Via User Simulation , 2005, SIGDIAL.

[14] Grace Chung. Automatically incorporating unknown words in JUPITER , 2000, INTERSPEECH.

[15] Günther Ruske,et al. Lexical out-of-vocabulary models for one-stage speech interpretation , 2005, INTERSPEECH.

[16] Robert I. Damper,et al. A multistrategy approach to improving pronunciation by analogy , 2000, CL.

[17] James Glass,et al. A Multimodal Home Entertainment Interface via a Mobile Device , 2008, ACL 2008.

[18] T. J. Watson. IMPROVEMENTS IN ENGLISH ASR FOR THE MALACH PROJECT USING SYLLABLE-CENTRIC MODELS , 2003 .

[19] John Makhoul,et al. BYBLOS: The BBN continuous speech recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20] Yan Han,et al. Trajectory Clustering of Syllable-Length Acoustic Models for Continuous Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[21] Richard M. Schwartz,et al. Automatic Detection Of New Words In A Large Vocabulary Continuous Speech Recognition System , 1989, HLT.

[22] Eugene Charniak,et al. Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[23] James R. Glass,et al. Segmentation and modeling in segment-based recognition , 1997, EUROSPEECH.

[24] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25] P. J. Price,et al. Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[26] P. Ladefoged. A course in phonetics , 1975 .

[27] Joseph Polifroni,et al. Integrating recognition confidence scoring with language understanding and dialogue modeling , 2000, INTERSPEECH.

[28] Samuel Jay Keyser,et al. CV Phonology: A Generative Theory of the Syllable , 1988 .

[29] Jan Svartvik,et al. The London-Lund corpus of spoken english , 1990 .

[30] Michael Picheny,et al. Automatic phonetic baseform determination , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[31] J. Makhoul,et al. Automatic modeling for adding new words to a large-vocabulary continuous speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[32] Sheryl R. Young,et al. Recognition Confidence Measures: Detection of Misrecognitions and Out- Of-Vocabulary Words , 1994 .

[33] K. Maekawa. CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .

[34] James R. Glass,et al. Unsupervised Word Acquisition from Speech using Pattern Discovery , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[35] Steven Greenberg,et al. Incorporating information from syllable-length time scales into automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[36] O. Fujimura,et al. Syllable as a unit of speech recognition , 1975 .

[37] Grace Yuet-Chee Chung. Towards multi-domain speech understanding with flexible and dynamic vocabulary , 2001 .

[38] Ken-ichi Iso,et al. Speech-activated text retrieval system for multimodal cellular phones , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39] Jonathan G. Fiscus,et al. NIST Rich Transcription 2002 Evaluation: A Preview , 2002, LREC.

[40] Edward Filisko,et al. Developing attribute acquisition strategies in spoken dialogue systems via user simulation , 2006 .

[41] Victor Zue,et al. Phonological parsing for reversible letter-to-sound/sound-to-letter generation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[42] Joseph Picone,et al. Syllable-based large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[43] Timothy J. Hazen,et al. Recognition Confidence Scoring for Use in Speech Understanding Systems , 2000 .

[44] Benoît Maison,et al. Automatic baseform generation from acoustic data , 2003, INTERSPEECH.

[45] P. Kiparsky. From cyclic phonology to lexical phonology , 1982 .

[46] Frédéric Bimbot,et al. Inference of variable-length acoustic units for continuous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47] James R. Glass,et al. A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[48] James Glass,et al. The SUMMIT speech recognition system: phonological modelling and lexical access , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[49] Steven Greenberg,et al. The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50] Otis Gospodnetic,et al. Lucene in Action , 2004 .

[51] Richard M. Schwartz,et al. Analysis of the errors produced by the 2004 BBN speech recognition system in the DARPA EARS evaluations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[52] Fabio Crestani,et al. Effects of word recognition errors in spoken query processing , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[53] James R. Glass,et al. New word acquisition using subword modeling , 2007, INTERSPEECH.

[54] Grace Chung. A three-stage solution for flexible vocabulary speech understanding , 2000, INTERSPEECH.

[55] Valentín Cardeñoso-Payo,et al. A system for speech driven information retrieval , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[56] Jane W. Chang,et al. Near-miss modeling: a segment-based approach to speech recognition , 1998 .

[57] Frederick Jelinek,et al. Classifying words for improved statistical language models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[58] Alfred Hauenstein. Using syllables in a hybrid HMM-ANN recognition system , 1997, EUROSPEECH.

[59] Thomas Schaaf. Detection of OOV words using generalized word models and a semantic class language model , 2001, INTERSPEECH.

[60] Hideaki Kikuchi,et al. Corpus of Spontaneous Japanese : Design , Annotation and XML Representation , 2004 .

[61] Philip C. Woodland,et al. Particle-based language modelling , 2000, INTERSPEECH.

[62] Yifan Gong,et al. Speech-enabled information retrieval in the automobile environment , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[63] Biing-Hwang Juang,et al. Spoken Query Processing for Information Retrieval , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[64] Bhiksha Raj,et al. Spokenquery: an alternate approach to chosing items with speech , 2004, INTERSPEECH.

[65] Lucian Galescu. Recognition of out-of-vocabulary words with sub-lexical language models , 2003, INTERSPEECH.

[66] Helen Meng,et al. The Use of Distinctive Features for Automatic Speech Recognition , 1991 .

[67] Sidney Greenbaum,et al. Comparing English worldwide : the International Corpus of English , 1996 .

[68] Hong C. Leung,et al. New-word addition and adaptation in a stochastic explicit-segment speech recognition system , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[69] Richard M. Schwartz,et al. The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system , 2005, INTERSPEECH.

[70] Ronald Rosenfeld,et al. Optimizing lexical and N-gram coverage via judicious use of linguistic data , 1995, EUROSPEECH.

[71] Alan W. Black,et al. Issues in building general letter to sound rules , 1998, SSW.

[72] Stephanie Seneff,et al. Response planning and generation in the MERCURY flight reservation system , 2002, Comput. Speech Lang..

[73] Giuseppe Riccardi,et al. How may I help you? , 1997, Speech Commun..

[74] Monika Woszczyna,et al. Detection and transcription of new words , 1993, EUROSPEECH.

[75] Fil Alleva,et al. Automatic New Word Acquisition: Spelling from Acoustics , 1989, HLT.

[76] Victor Zue,et al. The MIT SUMMIT Speech Recognition System: A Progress Report , 1989, HLT.

[77] Geoffrey Zweig,et al. Advances in speech transcription at IBM under the DARPA EARS program , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[78] Bhiksha Raj,et al. The MERL SpokenQuery information retrieval system a system for retrieving pertinent documents from a spoken query , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[79] Mitch Weintraub,et al. Automatic Learning of Word Pronunciation from Data , 1996 .

[80] Stephanie Strassel. Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text , 2004, LREC.

[81] Ute Ehrlich,et al. How to access audio files of large data bases using in-car speech dialogue systems , 2007, INTERSPEECH.

[82] Stephanie Seneff,et al. Phonological Parsing for Bi-directional Letter-to-Sound/Sound-to-Letter Generation , 1994, HLT.

[83] Hermann Ney,et al. Investigations on joint-multigram models for grapheme-to-phoneme conversion , 2002, INTERSPEECH.

[84] Frank K. Soong,et al. A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[85] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[86] James R. Glass,et al. Heterogeneous lexical units for automatic speech recognition: preliminary investigations , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[87] James R. Glass. A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[88] Hui Lin,et al. OOV detection by joint word/phone lattice alignment , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[89] Jeff A. Bilmes,et al. Use of syllable nuclei locations to improve ASR , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[90] Patti Price,et al. The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[91] W. Francis,et al. The London-Lund Corpus of Spoken English: Description and Research , 1992 .

[92] Sheryl R. Young,et al. Detecting misrecognitions and out-of-vocabulary words , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[93] Alexander H. Waibel,et al. Dictionary learning for spontaneous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[94] Robert L. Mercer,et al. An information theoretic approach to the automatic determination of phonemic baseforms , 1984, ICASSP.

[95] Lalit R. Bahl,et al. A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[96] Stephanie Seneff. Reversible Sound-to-Letter/Letter-to-Sound Modeling Based on Syllable Structure , 2007, HLT-NAACL.

[97] Paul Lamere,et al. Design of the CMU Sphinx-4 Decoder , 2022 .

[98] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[99] Stephanie Seneff,et al. CONTEXT-SENSITIVE LANGUAGE MODELING FOR LARGE SETS OF PROPER NOUNS IN MULTIMODAL DIALOGUE SYSTEMS , 2006, 2006 IEEE Spoken Language Technology Workshop.

[100] Kazuyo Tanaka,et al. Detection of unknown words in large vocabulary speech recognition , 1993, EUROSPEECH.

[101] Frédéric Bimbot,et al. Variable-length sequence matching for phonetic transcription using joint multigrams , 1995, EUROSPEECH.

[102] MarchandYannick,et al. A multistrategy approach to improving pronunciation by analogy , 2000 .

[103] Hermann Ney,et al. Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[104] James R. Glass,et al. Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[105] James R. Glass. Finding acoustic regularities in speech: applications to phonetic recognition , 1988 .

[106] Stanley F. Chen,et al. Conditional and joint models for grapheme-to-phoneme conversion , 2003, INTERSPEECH.

[107] Richard Lippmann,et al. Speech recognition by machines and humans , 1997, Speech Commun..

[108] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[109] Georges Linarès,et al. On-demand new word learning using world wide web , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[110] Nils J. Nilsson,et al. Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[111] Richard M. Schwartz,et al. A scalable architecture for Directory Assistance automation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[112] Min Tang,et al. Combining linguistic knowledge and acoustic information in automatic pronunciation lexicon generation , 2004, INTERSPEECH.

[113] Stephanie Seneff,et al. TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[114] S. Rieck,et al. Acoustic modelling of subword units in the Isadora speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[115] James F. Allen,et al. Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion , 2002, INTERSPEECH.

[116] Xuedong Huang,et al. Improvements on a trainable letter-to-sound converter , 1997, EUROSPEECH.

[117] Joseph Polifroni,et al. Recognition confidence scoring and its use in speech understanding systems , 2002, Comput. Speech Lang..

[118] Hong C. Leung,et al. PhoneBook: a phonetically-rich isolated-word telephone-speech database , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[119] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[120] James R. Glass,et al. Real-time probabilistic segmentation for segment-based speech recognition , 1998, ICSLP.

[121] I. Lee Hetherington. The MIT finite-state transducer toolkit for speech and language processing , 2004, INTERSPEECH.

[122] L. Baum,et al. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[123] David G. Stork,et al. Pattern Classification , 1973 .

[124] Victor Zue,et al. The VOYAGER Speech Understanding System: A Progress Report , 1989, HLT.

[125] Stephanie Seneff,et al. Two-pass strategy for handling OOVs in a large vocabulary recognition task , 2005, INTERSPEECH.

[126] Victor Zue,et al. Language modelling for recognition and understanding using layered bigrams , 1992, ICSLP.

[127] Dietrich Klakow,et al. Speech recognition for huge vocabularies by using optimized sub-word units , 2001, INTERSPEECH.

[128] Ronald A. Cole,et al. Speech recognition using syllable-like units , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[129] Mehryar Mohri,et al. Finite-State Transducers in Language and Speech Processing , 1997, CL.

[130] James F. Allen,et al. Bi-directional conversion between graphemes and phonemes using a joint N-gram model , 2001, SSW.

[131] Noam Chomsky,et al. The Sound Pattern of English , 1968 .

[132] Thilo Pfau,et al. Creating large subword units for speech recognition , 1997, EUROSPEECH.

[133] James R. Glass,et al. Segment-based recognition on the phonebook task: initial results and observations on duration modeling , 2001, INTERSPEECH.

[134] Sadaoki Furui,et al. Why Is the Recognition of Spontaneous Speech so Hard? , 2005, TSD.

[135] Timothy J. Hazen,et al. A comparison and combination of methods for OOV word detection and word confidence scoring , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[136] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[137] Kenneth Ward Church. Phrase-structure parsing: a method for taking advantage of allophonic constraints , 1983 .

[138] Rhys James Jones,et al. Continuous speech recognition using syllables , 1997, EUROSPEECH.

[139] Mark Huckvale,et al. Out-of-vocabulary rate reduction through dispersion-based lexicon acquisition , 2000 .

[140] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[141] L. Zhang,et al. Speech recognition using syllable and pseudo articulatory features modeling , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[142] Christian-Michael Westendorf,et al. Learning pronunciation dictionary from speech data , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[143] H. Kucera,et al. Computational analysis of present-day American English , 1967 .

[144] Walter Daelemans,et al. Transcription of out-of-vocabulary words in large vocabulary speech recognition based on phoneme-to-grapheme conversion , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[145] Hermann Ney,et al. Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[146] Dietrich Klakow,et al. OOV-detection in large vocabulary system using automatically defined word-fragments as fillers , 1999, EUROSPEECH.

[147] Mark A. Randolph,et al. Syllable-based constraints on properties of English sounds , 1989 .

[148] Steven Greenberg,et al. Performance improvements through combining phone- and syllable-scale information in automatic speech recognition , 1998, ICSLP.

[149] James R. Glass,et al. A multi-class approach for modelling out-of-vocabulary words , 2002, INTERSPEECH.

[150] James Glass,et al. Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[151] William I. Hallahan. DECtalk Software: Text-to-Speech Technology and Implementation , 1995, Digit. Tech. J..

[152] Sherif Abdou,et al. The BBN RT04 English broadcast news transcription system , 2005, INTERSPEECH.

[153] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[154] James Glass,et al. Multi-level acoustic segmentation of continuous speech , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[155] A. Glavieux,et al. Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[156] Hsiao-Wuen Hon,et al. An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[157] Hauke Schramm,et al. Strategies for name recognition in automatic directory assistance systems , 2000, Speech Commun..

[158] Stephanie Seneff,et al. ANGIE: a new framework for speech analysis based on morpho-phonological modelling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.