论文信息 - Some recent research work at LIUM based on the use of CMU Sphinx

Some recent research work at LIUM based on the use of CMU Sphinx

This paper presents an overview of the recent research work developed at LIUM using the CMU Sphinx tools. First, it describes the LIUM ASR system which reached very competitive results on French evaluation campaigns. Then, different research works using the LIUM ASR system are described: detection and characterization of spontaneous speech in large audio database, language modeling to detect and correct errors in automatic transcripts or system combination in the framework of statistical machine translation. Last, we discuss about the benefit of the availability of CMU Sphinx under a permissive open source license and, as we would like share with the CMU Sphinx community some parts of our work, we discuss about the difficulties we encountered to participate in the development of CMU Sphinx.

[1] Frédéric Béchet,et al. Local and global models for spontaneous speech segment detection and characterization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[2] Thierry Bazillon,et al. Manual vs Assisted Transcription of Prepared and Spontaneous Speech , 2008, LREC.

[3] Barbara Di Eugenio,et al. Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[4] Paul Deléglise,et al. Grapheme to phoneme conversion using an SMT system , 2009, INTERSPEECH.

[5] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[6] Paul Deléglise,et al. LIUM’s statistical machine translation system for IWSLT 2009 , 2009, IWSLT.

[7] F. Béchet. LIA―PHON: Un système complet de phonétisation de textes , 2001 .

[8] Mari Ostendorf,et al. Modeling disfluencies in conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9] Gunnar Evermann,et al. Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10] Geneviève Caelen-Haumont. Perlocutory Values and Functions of Melisms in Spontaneous Dialogue , 2002 .

[11] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Yoram Singer,et al. BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[13] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[14] Paul Deléglise,et al. Improvements to the LIUM French ASR system based on CMU sphinx: what helps to significantly reduce the word error rate? , 2009, INTERSPEECH.

[15] Elizabeth Shriberg,et al. Phonetic Consequences of Speech Disfluency , 1999 .

[16] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[17] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18] Richard M. Schwartz,et al. Improved Word-Level System Combination for Machine Translation , 2007, ACL.

[19] Hermann Ney,et al. Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[20] Richard Dufour,et al. Correcting asr outputs: Specific solutions to specific errors in French , 2008, 2008 IEEE Spoken Language Technology Workshop.

[21] Hui Jiang,et al. Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[22] Loïc Barrault,et al. Many , 2020, Definitions.

[23] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[24] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[25] Andreas Stolcke,et al. Comparing HMM, maximum entropy, and conditional random fields for disfluency detection , 2005, INTERSPEECH.

[26] Timothy R. Anderson,et al. The MIT-LL/AFRL IWSLT-2006 MT system , 2006, IWSLT.

[27] Matthew Lease,et al. Recognizing disfluencies in conversational speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28] J. Durand,et al. La phonologie du français contemporain : usages, variétés et structures , 2001 .

[29] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[30] Guy Perennou,et al. BDLEX: a lexicon for spoken and written french , 1998, LREC.

[31] Chris Callison-Burch,et al. Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation , 2009, ACL.

[32] Georges Linarès,et al. Spontaneous Speech Characterization and Detection in Large Audio Database , 2009 .