Development of a Chinese song name recognition system

The development of automatic speech recognition (ASR) technology in recent years has made it possible for some intelligent query systems to use a voice interface. Automatic song selection is a practical and interesting application of ASR. In this paper we describe our efforts to build and improve a Chinese song name recognition system. It is a large vocabulary, speaker-independent system currently in commercial use. This is a typical example of large list recognition tasks. We use a new paradigm for lager list recognition. In this framework, the spoken query is recognized by a LVCSR module firstly and then the recognized result is used to search for the final song name. Unlike transcription tasks, such as Switchboard task, our LVCSR module is an in-domain application. We present some innovative optimizations that improve the song name recognition accuracy. The experimental result shows that there techniques make relative 7.36%, 26.87% and 32.71% error rate reduction for top 1, top 5 and top 10 results respectively upon the conventional grammar-constrained recognizer.

[1]  Yonghong Yan,et al.  Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval , 2008, IEICE Trans. Inf. Syst..

[2]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Yonghong Yan,et al.  Robust state clustering using phonetic decision trees , 2004, Speech Commun..

[4]  Yonghong Yan,et al.  Large list recognition using voice search framework , 2010, 2010 2nd International Conference on Education Technology and Computer.

[5]  Yonghong Yan,et al.  A One-Pass Real-Time Decoder Using Memory-Efficient State Network , 2008, IEICE Trans. Inf. Syst..

[6]  Yonghong Yan,et al.  Using Discriminative Training Techniques in Practical Intelligent Music Retrieval System , 2008, 2008 Fourth International Conference on Natural Computation.

[7]  Bhuvana Ramabhadran,et al.  Innovative approaches for large vocabulary name recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Gerhard Rigoll,et al.  Unlimited Vocabulary Script Recognition Using Character N-Grams , 2000, DAGM-Symposium.

[9]  Junlan Feng,et al.  Effects of Word Confusion Networks on Voice Search , 2009, EACL.

[10]  Thilo Pfau,et al.  Creating large subword units for speech recognition , 1997, EUROSPEECH.

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  Martha Larson,et al.  Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches , 2000, INTERSPEECH.

[13]  Martha Larson,et al.  Sub-word-based language models for speech recognition : implications for spoken document retrieval , 2001 .

[14]  Miroslav Novak,et al.  Two-pass search strategy for large list recognition on embedded speech recognition platforms , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..