论文信息 - Speech Processing for Audio Indexing

Speech Processing for Audio Indexing

This paper addresses some of the recent trends in speech processing, with a focus on speech-to-text transcription as a means to facilitate access to multimedia information in a multilingual context. A brief overview of automatic speech recognition is given along with indicative performance measures for a range of tasks. Enriched transcriptions, that is enhancing the automatic word transcripts with meta-data derived from the audio data is discussed, followed by some hightlights of recent progress and remaining challenges in speech recognition.

Jean-Luc Gauvain | Lori Lamel

[1] Andreas Stolcke,et al. Recent innovations in speech-to-text transcription at SRI-ICSI-UW , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Helmer Strik,et al. Modeling pronunciation variation for ASR: A survey of the literature , 1999, Speech Commun..

[3] Jr. J.P. Campbell,et al. Speaker recognition: a tutorial , 1997, Proc. IEEE.

[4] Alexander H. Waibel,et al. Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.

[5] Xavier L. Aubert,et al. An overview of decoding techniques for large vocabulary continuous speech recognition , 2002, Comput. Speech Lang..

[6] Jean-Luc Gauvain,et al. Speaker Diarization: From Broadcast News to Lectures , 2006, MLMI.

[7] Joseph Picone,et al. Benchmarking human performance for continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8] Lori Lamel,et al. Pronunciation variants across system configuration, language and speaking style , 1999, Speech Commun..

[9] Andrei Popescu-Belis,et al. Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[10] Jean-Luc Gauvain,et al. Lightly supervised and unsupervised acoustic model training , 2002, Comput. Speech Lang..

[11] Marc A. Zissman,et al. Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[12] Andreas Stolcke,et al. Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures , 2003, NAACL.

[13] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[14] Michael J. Swain,et al. SpeechBot: a Speech Recognition based Audio Indexing System for the Web , 2000, RIAO.

[15] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[16] George Zavaliagkos,et al. Using untranscribed training data to improve performance , 1998, ICSLP.

[17] Olivier Galibert,et al. The LIMSI 2006 TC-STAR EPPS Transcription Systems , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18] R. Rosenfeld,et al. Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[19] Thomas Pellegrini,et al. Experimental detection of vowel pronunciation variants in Amharic , 2006, LREC.

[20] Hynek Hermansky,et al. TRAPS - classifiers of temporal patterns , 1998, ICSLP.

[21] Tanja Schultz,et al. Multilingual Speech Processing , 2006 .

[22] F. Jelinek,et al. Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[23] Richard Lippmann,et al. Speech recognition by machines and humans , 1997, Speech Commun..

[24] Wayne H. Ward,et al. Speech recognition , 1997 .

[25] Herman J. M. Steeneken,et al. Human benchmarks for speaker independent large vocabulary recognition performance , 1995, EUROSPEECH.

[26] N. R. Dixon,et al. Preliminary results on the performance of a system for the automatic recognition of continuous speech , 1976, ICASSP.

[27] Jean-Luc Gauvain,et al. Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28] John Makhoul,et al. Using quick transcriptions to improve conversational speech models , 2004, INTERSPEECH.

[29] Olivier Galibert,et al. Speech transcription in multiple languages , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30] Andreas Stolcke,et al. Using MLP features in SRI's conversational speech recognition system , 2005, INTERSPEECH.

[31] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..