Dynamic language modeling for European Portuguese

This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size.

[1]  Alexandre Allauzen,et al.  Open vocabulary ASR for audiovisual document indexation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Alex Waibel,et al.  TRANSCRIBING MULTILINGUAL BROADCAST NEWS USING HYPOTHESIS DRIVEN LEXICAL ADAPTATION , 1998 .

[3]  Ciro Martins,et al.  Using partial morphological analysis in language modeling estimation for large vocabulary portuguese speech recognition , 1999, EUROSPEECH.

[4]  João Paulo da Silva Neto,et al.  A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ANN models , 2005, INTERSPEECH.

[5]  Andreas Stolcke,et al.  Integrating MAP, marginals, and unsupervised language model adaptation , 2007, INTERSPEECH.

[6]  Yan Huang,et al.  Vocabulary and language model adaptation using information retrieval , 2004, INTERSPEECH.

[7]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Wen Wang,et al.  Techniques for effective vocabulary selection , 2003, INTERSPEECH.

[10]  Ciro Martins,et al.  Automatic estimation of language model parameters for unseen words using morpho-syntactic contextual information , 2008, INTERSPEECH.

[11]  Mari Ostendorf,et al.  Transforming out-of-domain estimates to improve in-domain language models , 1997, EUROSPEECH.

[12]  Olivier Galibert,et al.  Speech transcription in multiple languages , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[14]  Jean-Luc Gauvain,et al.  Dynamic language modeling for broadcast news , 2004, INTERSPEECH.

[15]  Patrick Cardinal,et al.  Automated closed-captioning of live TV broadcast news in French , 2003, INTERSPEECH.

[16]  I. Lee Hetherington A characterization of the problem of new, out-of-vocabulary words in continuous-speech recognition and understanding , 1995 .

[17]  Ricardo Ribeiro,et al.  Using Morphossyntactic Information in TTS Systems: Comparing Stratgies for European Portuguese , 2003, PROPOR.

[18]  Ciro Martins,et al.  Broadcast news subtitling system in Portuguese , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  João Paulo da Silva Neto,et al.  AUDIMUS.MEDIA: A Broadcast News Speech Recognition System for the European Portuguese Language , 2003, PROPOR.

[20]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[21]  Mari Ostendorf,et al.  Improving out-of-vocabulary name resolution , 2005, Comput. Speech Lang..

[22]  Wen Wang,et al.  Building a highly accurate Mandarin speech recognizer , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[23]  Bhuvana Ramabhadran,et al.  The IBM 2007 speech transcription system for European parliamentary speeches , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[24]  Ciro Martins,et al.  The development of a speaker independent continuous speech recognizer for portuguese , 1997, EUROSPEECH.

[25]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[26]  Ciro Martins,et al.  Dynamic language modeling for a daily broadcast news transcription system , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[27]  João Paulo da Silva Neto,et al.  Combination of acoustic models in continuous speech recognition hybrid systems , 2000, INTERSPEECH.

[28]  Georges Linarès,et al.  On-demand new word learning using world wide web , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Steve Young,et al.  Corpus-based methods in language and speech processing , 1997 .

[30]  Marcello Federico,et al.  Broadcast news LM adaptation over time , 2004, Comput. Speech Lang..

[31]  Isabel Trancoso,et al.  Spoken Language Corpora for Speech Recognition and Synthesis in European Portuguese , 1998 .

[32]  Isabel Trancoso,et al.  THE DEVELOPMENT OF AN AUTOMATIC SYSTEM FOR SELECTIVE DISSEMINATION OF MULTIMEDIA INFORMATION , 2003 .

[33]  Roger K. Moore Computer Speech and Language , 1986 .

[34]  Ciro Martins,et al.  Dynamic Language Modeling for the European Portuguese , 2008, PROPOR.

[35]  António Teixeira,et al.  Language Models in Automatic Speech Recognition , 2005 .

[36]  Ciro Martins,et al.  Dynamic Vocabulary Adaptation for a daily and real-time Broadcast News Transcription System , 2006, 2006 IEEE Spoken Language Technology Workshop.

[37]  Peng Xu,et al.  Random forests and the data sparseness problem in language modeling , 2007, Comput. Speech Lang..

[38]  Pascale Sébillot,et al.  An unsupervised web-based topic language model adaptation method , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Ciro Martins,et al.  Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system , 1995, EUROSPEECH.

[40]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[41]  Alexandre Allauzen,et al.  Diachronic vocabulary adaptation for broadcast news transcription , 2005, INTERSPEECH.

[42]  James R. Glass,et al.  Modeling out-of-vocabulary words for robust speech recognition , 2000, INTERSPEECH.

[43]  Rong Zhang,et al.  Data selection for speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[44]  Pascale Sébillot,et al.  Constraint selection for topic-based MDI adaptation of language models , 2009, INTERSPEECH.

[45]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[46]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[47]  Marcello Federico,et al.  Development and Evaluation of an Italian Broadcast News Corpus , 2000, LREC.

[48]  Isabel Trancoso,et al.  Grapheme-to-phone using finite-state transducers , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[49]  Isabel Trancoso,et al.  AUTOMATIC VS. MANUAL TOPIC SEGMENTATION AND INDEXATION IN BROADCAST NEWS , 2006 .

[50]  C. Huyck,et al.  A stemming algorithm for the portuguese language , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[51]  Tanja Schultz,et al.  Unsupervised language model adaptation using latent semantic marginals , 2006, INTERSPEECH.

[52]  Ciro Martins,et al.  Vocabulary selection for a broadcast news transcription system using a morpho-syntactic approach , 2007, INTERSPEECH.