Isolated Word Recognition using Morph - Knowledge for Telugu Language

Building a speech recognition system for Indian languages is an open question and requires focus. This paper highlights on a new model for speech recognition system and uses syllable as the basic unit. This model has five phases, the first three phases focused on training the data and building Trie structure to reduce the time and space and the last two phases are for testing. Training includes, first phase for syllable extraction from text and speech and annotating data sets. Second phase focuses on building the three state model for each syllable unit and third phase, for building Trie structure using morph knowledge of Telugu language. Testing includes the fourth and fifth phase. Fourth phase is to mark the rough boundary of the syllable using the intensity of the signal and these sequence of syllables are recognized during fifth phase. The experiment is conducted on CIIL Telugu corpus and achieved good results in recognizing the words that were not used for training. For training we have used 300 words and for testing we recorded 100 new words and 80% of the words were recognized. General Terms Computer Science Speech Processing

[1]  E. A. Martin,et al.  Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mathias Creutz Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency , 2003, ACL.

[3]  P. V. S. Rao,et al.  Hindi speech database , 2000, INTERSPEECH.

[4]  Andrew J. Lundberg,et al.  Discovering Morphemic Suffixes A Case Study In MDL Induction , 1995 .

[5]  O. Fujimura,et al.  Syllable as a unit of speech recognition , 1975 .

[6]  Rajesh M. Hegde,et al.  Segmentation of speech into syllable-like units , 2003, INTERSPEECH.

[7]  K. V. N. Sunitha,et al.  Unsupervised Stemmer to Improve Rule Based Morph Analyzer , 2010 .

[8]  Joyojeet Pal,et al.  Speech Recognition for Illiterate Access to Information and Technology , 2006, 2006 International Conference on Information and Communication Technologies and Development.

[9]  Hervé Déjean Morphemes as Necessary Concept for Structures Discovery from Untagged Corpora , 1998, CoNLL.

[10]  Joseph Picone,et al.  Syllable-based large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[11]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[12]  Daniel Jurafsky,et al.  Knowledge-Free Induction of Morphology Using Latent Semantic Analysis , 2000, CoNLL/LLL.

[13]  K. V. N. Sunitha,et al.  A Novel approach to improve rule based Telugu morphological analyzer , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[14]  Frank K. Soong,et al.  High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[15]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[16]  K. V. N. Sunitha,et al.  Syllable Analysis to Build a Dictation System in Telugu language , 2010, ArXiv.

[17]  Samarth Keshava A Simpler , Intuitive Approach to Morpheme Induction , 2006 .