Automatic Phonetic Transcription for read, extempore and conversation speech for an Indian language: Bengali

In this work, we have analyzed the proposed Automatic Phonetic Transcription (APT) approach for read, extempore and conversation modes of speech for Bengali language. In our earlier work, the APT was carried out using read speech. In this paper, main focus is on deriving APT for Extempore and Conversation modes of speech in Bengali language and their analysis. This framework of deriving APT can be extended to any Indian language. The Automatic Phonetic Transcription Systems (APTS) were developed separately for read, extempore and conversation modes of speech. In this study, APT has been carried out on read, extempore and conversation modes of speech using 35, 33 and 30 phones respectively. APT has been carried out using Hidden Markov Models (HMMs) and FeedForward Neural Networks (FFNNs). Mel-frequency Cepstral Coefficients are used as features for building the models. The best obtained performance accuracies using HMMs for read, extempore and conversation modes are 41.65%, 29.20% and, 23.48% respectively. Using FFNNs, the recognition accuracies for read, extempore and conversation modes are 53.87%, 46.19% and 33.63% respectively.

[1]  K. Sreenivasa Rao,et al.  Development of phonetic engine for Indian languages: Bengali and Oriya , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[2]  Dau-Cheng Lyu,et al.  Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  K. Sreenivasa Rao,et al.  Development of Consonant-Vowel Recognition Systems for Indian languages: Bengali and Odia , 2013, 2013 Annual IEEE India Conference (INDICON).

[4]  P. Neural Network Classifiers for Speech Recognition , 2007 .

[5]  Bayya Yegnanarayana,et al.  Acoustic-phonetic information from excitation source for refining manner hypotheses of a phone recognizer , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  K. Sreenivasa Rao,et al.  Phonetic and Prosodically Rich Transcribed speech corpus in Indian languages: Bengali and Odia , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[7]  B. Tranel The sounds of French: The International Phonetic Association , 1987 .

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  Suryakanth V. Cai,et al.  SPOTTING CONSONANT-VOWEL UNITS IN CONTINUOUS SPEECH USING AUTOASSOCIATIVE NEURAL NETWORKS AND SUPPORT VECTOR MACHINES , 2009 .

[10]  Jean-Luc Gauvain,et al.  Conversational telephone speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  K. Sreenivasa Rao,et al.  Spotting and Recognition of Consonant-Vowel Units from Continuous Speech Using Accurate Detection of Vowel Onset Points , 2012, Circuits, Systems, and Signal Processing.

[12]  B. Yegnanarayana,et al.  Spotting consonant-vowel units in continuous speech using alitoassociative neural networks and support vector machines , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..