An Efficient Continuous Speech Recognition System for Dravidian Languages Using Support Vector Machine

This paper mainly focuses on developing a novel speech recognition system for Dravidian languages such as Tamil, Malayalam, Telugu, and Kannada. This research work targets to afford a well-organized way for human to interconnect with computers absolutely for people with disabilities who facade variety of stumbling blocks while using computers. This work would be very helpful to the native speakers in various applications. The proposed CSR system comprises of three steps namely preprocessing, feature extraction, and classification. In the preprocessing step, the input signal is preprocessed through the steps such as pre-emphasis filter, framing, windowing, and band stop filtering in order to remove the background noise and to enrich the signal. The best-filtered and the enriched signal from the preprocessing step is taken as the input for the further process of CSR system. The speech features being the most essential segment in speech recognition system. The most powerful and widely used short-term energy (STE) and zero-crossing rate (ZCR) are used for continuous speech segmentation, and Mel-frequency cepstral coefficients (MFCC) and shifted delta cepstrum (SDC) are used for recognition task. Feature vectors are given as the input to the classifier such as support vector machine (SVM) for classifying and recognizing Dravidian language speech. Experiments are carried out with real-time Dravidian speech signals, and the results reveal that the proposed method competes with the existing methods reported in literature.

[1]  S. Palanivel Spoken Word Recognition Strategy for Tamil Language , 2012 .

[2]  Hervé Bourlard,et al.  Grapheme-Based Automatic Speech Recognition Using KL-HMM , 2011, INTERSPEECH.

[3]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Kishore Prahallad,et al.  AANN: an alternative to GMM for pattern recognition , 2002, Neural Networks.

[5]  Hui Jiang,et al.  Incorporating Training Errors for Large Margin HMMS Under Semi-Definite Programming Framework , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Vennila Ramalingam,et al.  Facial expression recognition - A real time approach , 2009, Expert Syst. Appl..

[7]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  V. Radha,et al.  Efficient Speaker Independent Isolated Speech Recognition for Tamil Language Using Wavelet Denoising and Hidden Markov Model , 2013 .

[10]  Amir Abolfazl Suratgar,et al.  Speech Recognition from PSD using Neural Network , 2009 .

[11]  Georg Heigold,et al.  A log-linear discriminative modeling framework for speech recognition , 2010 .

[12]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Baifen Liu Research and implementation of the speech recognition technology based on DSP , 2011, 2011 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC).

[14]  Margit Antal,et al.  SPEAKER INDEPENDENT PHONEME CLASSIFICATION IN CONTINUOUS SPEECH , 2004 .