A syllable-based Turkish speech recognition system by using time delay neural networks (TDNNs)

In this paper, we present a model for Turkish speech recognition. The model is syllable-based, where the recognition is performed through syllables as speech recognition units. The main goal of the model is to recognize as much as possible of a given continuous speech by identifying only a small set of syllables in the language. For that purpose, only the syllable types with a higher frequency are selected for the recognition. The use of longer recognition units in speech recognition systems increases the success of the recognition since it is easier to detect the endpoints of syllables when compared to phonemes. On the other side, word-based recognition requires a very large dataset that includes all the words and word forms in the language, which is also another challenge. Hereby, we take the advantage of Turkish being an ortographically transparent and syllabified language. Our model employs time delay neural networks (TDNNs) for learning syllables. We achieve an accuracy of %65.6 on our large vocabulary continuous speech corpus. In addition, we define an algorithm for the automatic detection of syllable boundaries which gives an accuracy of %44. The automatic syllable boundary detection module is used for the recognition of isolated syllables rather than a continuous speech.

[1]  Wenju Liu,et al.  Improved Syllable Based Acoustic Modeling by Inter-Syllable Transition Model for Continuous Chinese Speech Recognition , 2009, 2009 Chinese Conference on Pattern Recognition.

[2]  Simon King,et al.  Speech recognition via phonetically featured syllables , 1998, ICSLP.

[3]  Joseph Picone,et al.  Syllable-based large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[4]  S. Bengio,et al.  Phoneme-grapheme based speech recognition system , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[5]  Alexander H. Waibel,et al.  Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[6]  K. Gunel,et al.  Syllable based Turkish speech recognition using Dynamic Time Warping and Multilayer Perceptron , 2008, 2008 IEEE 16th Signal Processing, Communication and Applications Conference.

[7]  Mark Dredze,et al.  Learning Sub-Word Units for Open Vocabulary Speech Recognition , 2011, ACL.

[8]  João Paulo da Silva Neto,et al.  The use of syllable segmentation information in continuous speech recognition hybrid systems applied to the Portuguese language , 2000, INTERSPEECH.

[9]  Bhuvana Ramabhadran,et al.  A new method for OOV detection using hybrid word/fragment system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Tan Lee,et al.  A neural network based speech recognition system for isolated Cantonese syllables , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Yasuo Ariki,et al.  Syllable-based acoustic modeling for Japanese spontaneous speech recognition , 2003, INTERSPEECH.

[12]  A. Durgunoğlu,et al.  Learning to read in Turkish. , 2006, Developmental science.

[13]  Michael J. Watts,et al.  Phoneme-Based Speech Recognition via Fuzzy Neural Networks Modeling and Learning , 1998, Inf. Sci..

[14]  Margit Antal,et al.  TOWARD A SIMPLE PHONEME BASED SPEECH RECOGNITION SYSTEM , 2007 .

[15]  M. Christiansen Reading in a Second Language: Moving from Theory to Practice , 2010 .

[16]  Hermann Ney,et al.  Phoneme-based continuous speech recognition results for different language models in the 1000-word spicos system , 1988, Speech Commun..

[17]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[18]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..