Romanian Language Voice Browsing for Web Applications Using Grapheme Level Acoustic Modeling

The aim of this article is to present a demonstrative Web application with Romanian language continuous speech recognition based multimodal interface. The scope of the paper also includes the presentation and testing of the capabilities of a context dependent grapheme based acoustic model for the Romanian language. The article describes the system architecture, the Web application development and the speech database used for the acoustic feature vector construction and acoustic model training. Further the task grammar is presented. At the end recognition results are presented in both offline and online operating mode. The used speech corpora together with the transcriptions are freely available for academic use on the NaviRo project website: http://users.utcluj.ro/~jdomokos/naviro/.

[1]  Doru-Petru Munteanu,et al.  Robust Romanian language automatic speech recognizer based on multistyle training , 2008 .

[2]  Florian Metze,et al.  Subword Modeling for Automatic Speech Recognition: Past, Present, and Emerging Approaches , 2012, IEEE Signal Processing Magazine.

[3]  Jen-Tzung Chien,et al.  Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances , 2012, IEEE Signal Processing Magazine.

[4]  Horia Cucu,et al.  Romanian Spoken Language Resources and Annotation for Speaker Independent Spontaneous Speech Recognition , 2010, 2010 Fifth International Conference on Digital Telecommunications.

[5]  James R. Glass,et al.  Collecting Voices from the Cloud , 2010, LREC.

[6]  Ian McGraw,et al.  The WAMI toolkit for developing, deploying, and evaluating web-accessible multimodal interfaces , 2008, ICMI '08.

[7]  Andi Buzo,et al.  SPONTANEOUS SPEECH RECOGNITION FOR ROMANIAN IN SPOKEN DIALOGUE SYSTEMS , 2010 .

[8]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[9]  Nicu Sebe,et al.  Multimodal interfaces: Challenges and perspectives , 2009, J. Ambient Intell. Smart Environ..

[10]  Simon King,et al.  The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate , 2011, Speech Commun..

[11]  Svetlana Segarceanu,et al.  ProtoLOGOS, system for Romanian language automatic speech recognition and understanding (ASRU) , 2009, 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue.

[12]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[13]  Daniel Jurafsky,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2009, Prentice Hall series in artificial intelligence.

[14]  Simon King,et al.  A grapheme-based method for automatic alignment of speech and text data , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).