论文信息 - Design and Development of Marathi Speech Interface System

Design and Development of Marathi Speech Interface System

Speech is the most prominent and natural form of communication between humans. It has potential of being an important mode of interaction with computer. Man–machine interface has always been proven to be a challenging area in natural language processing and in speech recognition research. There are growing interests in developing machines that can accept speech as input. Normal person generally communicate with the computer through a mouse or keyboard. It requires training and hard work as well as knowledge about computer, which is a limitation at certain levels. Marathi is used as official language at government of Maharashtra. There is a need for developing systems that enable human–machine interaction in Indian regional languages. The objective of this research is to design and development of the Marathi speech Activated Talking Calculator (MSAC) as an interface system. The MSAC is speaker-dependent speech recognition system that is used to perform basic mathematical operation. It can recognize isolated spoken digit from 0 to 50 and basic operation like addition, subtraction, multiplication, start, stop, equal, and exit. Database is an essential requirement to design the speech recognition system. To reach up to the objectives set, a database having 22,320 sizes of vocabularies is developed. The MSAC system trained and tested using the Mel Frequency Cepstral Coefficients (MFCC), Linear Discriminative Analysis (LDA), Principal Component Analysis (PCA), Linear Predictive Codding (LPC), and Rasta-PLP individually. Training and testing of MSAC system are done with individually Mel Frequency Linear Discriminative Analysis (MFLDA), Mel Frequency Principal Component Analysis (MFPCA), Mel Frequency Discrete Wavelet Transformation (MFDWT), and Mel Frequency Linear Discrete Wavelet Transformation (MFLDWT) fusion feature extraction techniques. This experiment is proposed and tested the Wavelet Decomposed Cepstral Coefficient (WDCC) with 18, 36, and 54 coefficients approach. The performance of MSAC system is calculated on the basis of accuracy and real-time factor (RTF). From the experimental results, it is observed that the MFCC with 39 coefficients achieved higher accuracy than 13 and 26 variations. The MFLDWT is proven higher accuracy than MFLDA, MFPCA, MFDWT, and Mel Frequency Principal Discrete Wavelet Transformation (MFPDWT). From this research, we recommended that WDCC is robust and dynamic techniques than MFCC, LDA, PCA, and LPC. MSAC interface application is directly beneficial for society people for their day to day activity.

Santosh Gaikwad | Bharti Gawali | Suresh Mehrotra

[1] Pravin Yannawar,et al. A Review on Speech Recognition Technique , 2010 .

[2] Johan A. du Preez,et al. Developing a Multilingual Telephone Based Information System in African Languages , 2000, LREC.

[3] S. Mallat. VI – Wavelet zoom , 1999 .

[4] Urmila Shrawankar,et al. Speech user interface for computer based education system , 2010, 2010 International Conference on Signal and Image Processing.

[5] Raghunath S. Holambe,et al. New Feature Extraction Techniques for Marathi Digit Recognition , 2009 .

[6] John Robertson,et al. Automatic speech recognition for generalised time based media retrieval and indexing , 1998, MULTIMEDIA '98.

[7] Kashyap Patel,et al. Speech Recognition and Verification Using MFCC & VQ , 2013 .

[8] Sadaoki Furui,et al. 50 Years of Progress in Speech and Speaker Recognition Research , 1970 .

[9] Lawrence R. Rabiner,et al. On integrating insights from human speech perception into automatic speech recognition , 2005, INTERSPEECH.

[10] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[11] Carl M. Rebman,et al. Speech recognition in the human-computer interface , 2003, Inf. Manag..