Turkish Speech Recognition

Automatic speech recognition (ASR) is one of the most important applications of speech and language processing, as it forms the bridge between spoken and written language processing. This chapter presents an overview of the foundations of ASR, followed by a summary of Turkish language resources for ASR and a review of various Turkish ASR systems. Language resources include acoustic and text corpora as well as linguistic tools such as morphological parsers, morphological disambiguators, and dependency parsers, discussed in more detail in other chapters. Turkish ASR systems vary in the type and amount of data used for building the models. The focus of most of the research for Turkish ASR is the language modeling component covered in Chap. 4.

[1]  Ebru Arisoy,et al.  Lattice Extension and Vocabulary Adaptation for Turkish LVCSR , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Kemal Oflazer,et al.  Erratum: Dependency Parsing of Turkish , 2008, CL.

[3]  Ebru Arisoy,et al.  Turkish Broadcast News Transcription and Retrieval , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Ngoc Thang Vu,et al.  GlobalPhone: A multilingual text & speech database in 20 languages , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Ebru Arisoy,et al.  Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT - Workshop Notes , 2012, WLM@NAACL-HLT.

[6]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[7]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[8]  Alexander H. Waibel,et al.  Phonetic-distance-based hypothesis driven lexical adaptation for transcribing multlingual broadcast news , 1998, ICSLP.

[9]  Tolga Çiloglu,et al.  Orientel-turkish: telephone speech database description and notes on the experience , 2004, INTERSPEECH.

[10]  Levent M. Arslan,et al.  Language model adaptation for automatic call transcription , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Josef Psutka,et al.  Two-Pass Recognition of Czech Speech Using Adaptive Vocabulary , 2001, TSD.

[12]  Tanja Schultz,et al.  Turkish LVCSR: towards better speech recognition for agglutinative languages , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  Hermann Ney,et al.  Acoustic modeling with deep neural networks using raw time signal for LVCSR , 2014, INTERSPEECH.

[14]  Levent M. Arslan,et al.  Confidence Measures for Turkish Call Center Conversations , 2011, INTERSPEECH.

[15]  Tanja Schultz,et al.  Globalphone: a multilingual speech and text database developed at karlsruhe university , 2002, INTERSPEECH.

[16]  Murat Saraclar,et al.  Resources for Turkish morphological processing , 2011, Lang. Resour. Evaluation.

[17]  K. Oflazer,et al.  Incorporating language constraints in sub-word based speech recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[18]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[19]  Robert D. Rodman,et al.  An Introduction to Language , 1984 .

[20]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[21]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[22]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[23]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[24]  Gökhan Tür,et al.  Statistical Morphological Disambiguation for Agglutinative Languages , 2000, COLING.

[25]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[26]  Mübeccel Demirekler,et al.  Implementation and evaluation of a text-to-speech synthesis system for turkish , 2003, INTERSPEECH.

[27]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[28]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[29]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[30]  Levent M. Arslan,et al.  Turkish LVCSR system for call center conversations , 2010, 2010 IEEE 18th Signal Processing and Communications Applications Conference.

[31]  Alexander H. Waibel,et al.  Selection criteria for hypothesis driven lexical adaptation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[32]  Francoise Beaufays,et al.  “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .

[33]  Murat Saraclar,et al.  Morphological Disambiguation of Turkish Text with Perceptron Algorithm , 2009, CICLing.

[34]  Kemal Oflazer,et al.  The architecture and the implementation of a finite state pronunciation lexicon for Turkish , 2006, Comput. Speech Lang..

[35]  Ebru Arisoy,et al.  A unified language model for large vocabulary continuous speech recognition of Turkish , 2006, Signal Process..

[36]  Mübeccel Demirekler,et al.  Turkish speech corpora and recognition tools developed by porting SONIC: Towards multilingual speech recognition , 2007, Comput. Speech Lang..

[37]  Deniz Yuret,et al.  Learning Morphological Disambiguation Rules for Turkish , 2006, NAACL.

[38]  Murat Saraclar,et al.  Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Mikko Kurimo,et al.  On lexicon creation for turkish LVCSR , 2003, INTERSPEECH.

[40]  Ebru Ar,et al.  TURKISH DICTATION SYSTEM FOR RADIOLOGY AND BROADCAST NEWS APPLICATIONS , 2004 .

[41]  Alex Waibel,et al.  TRANSCRIBING MULTILINGUAL BROADCAST NEWS USING HYPOTHESIS DRIVEN LEXICAL ADAPTATION , 1998 .

[42]  Erhan Mengusoglu,et al.  Turkish LVCSR: Database Preparation and Language Modeling for an Agglutinative Language , 2001 .

[43]  Kemal Oflazer,et al.  Two-level Description of Turkish Morphology , 1993, EACL.

[44]  Geoffrey Zweig,et al.  ON THE EFFECT OFWORD ERROR RATE ON AUTOMATED QUALITY MONITORING , 2006, 2006 IEEE Spoken Language Technology Workshop.