Speech is the most prominent and natural form of communication between humans. It has potential of being an important mode of interaction with computer. Man–machine interface has always been proven to be a challenging area in natural language processing and in speech recognition research. There are growing interests in developing machines that can accept speech as input. Normal person generally communicate with the computer through a mouse or keyboard. It requires training and hard work as well as knowledge about computer, which is a limitation at certain levels. Marathi is used as official language at government of Maharashtra. There is a need for developing systems that enable human–machine interaction in Indian regional languages. The objective of this research is to design and development of the Marathi speech Activated Talking Calculator (MSAC) as an interface system. The MSAC is speaker-dependent speech recognition system that is used to perform basic mathematical operation. It can recognize isolated spoken digit from 0 to 50 and basic operation like addition, subtraction, multiplication, start, stop, equal, and exit. Database is an essential requirement to design the speech recognition system. To reach up to the objectives set, a database having 22,320 sizes of vocabularies is developed. The MSAC system trained and tested using the Mel Frequency Cepstral Coefficients (MFCC), Linear Discriminative Analysis (LDA), Principal Component Analysis (PCA), Linear Predictive Codding (LPC), and Rasta-PLP individually. Training and testing of MSAC system are done with individually Mel Frequency Linear Discriminative Analysis (MFLDA), Mel Frequency Principal Component Analysis (MFPCA), Mel Frequency Discrete Wavelet Transformation (MFDWT), and Mel Frequency Linear Discrete Wavelet Transformation (MFLDWT) fusion feature extraction techniques. This experiment is proposed and tested the Wavelet Decomposed Cepstral Coefficient (WDCC) with 18, 36, and 54 coefficients approach. The performance of MSAC system is calculated on the basis of accuracy and real-time factor (RTF). From the experimental results, it is observed that the MFCC with 39 coefficients achieved higher accuracy than 13 and 26 variations. The MFLDWT is proven higher accuracy than MFLDA, MFPCA, MFDWT, and Mel Frequency Principal Discrete Wavelet Transformation (MFPDWT). From this research, we recommended that WDCC is robust and dynamic techniques than MFCC, LDA, PCA, and LPC. MSAC interface application is directly beneficial for society people for their day to day activity.
[1]
Pravin Yannawar,et al.
A Review on Speech Recognition Technique
,
2010
.
[2]
Johan A. du Preez,et al.
Developing a Multilingual Telephone Based Information System in African Languages
,
2000,
LREC.
[3]
S. Mallat.
VI – Wavelet zoom
,
1999
.
[4]
Urmila Shrawankar,et al.
Speech user interface for computer based education system
,
2010,
2010 International Conference on Signal and Image Processing.
[5]
Raghunath S. Holambe,et al.
New Feature Extraction Techniques for Marathi Digit Recognition
,
2009
.
[6]
John Robertson,et al.
Automatic speech recognition for generalised time based media retrieval and indexing
,
1998,
MULTIMEDIA '98.
[7]
Kashyap Patel,et al.
Speech Recognition and Verification Using MFCC & VQ
,
2013
.
[8]
Sadaoki Furui,et al.
50 Years of Progress in Speech and Speaker Recognition Research
,
1970
.
[9]
Lawrence R. Rabiner,et al.
On integrating insights from human speech perception into automatic speech recognition
,
2005,
INTERSPEECH.
[10]
Hynek Hermansky,et al.
RASTA processing of speech
,
1994,
IEEE Trans. Speech Audio Process..
[11]
Carl M. Rebman,et al.
Speech recognition in the human-computer interface
,
2003,
Inf. Manag..
[12]
Ashish Verma,et al.
A large-vocabulary continuous speech recognition system for Hindi
,
2004,
IBM J. Res. Dev..
[13]
Xianwei Zhou,et al.
DWT features performance analysis for automatic speech recognition of Urdu
,
2014,
SpringerPlus.
[14]
Santosh Gaikwad,et al.
Novel approach based feature extraction for Marathi continuous speech recognition
,
2012,
ICACCI '12.
[15]
Oh-Wook Kwon,et al.
Speech feature analysis using variational Bayesian PCA
,
2003,
IEEE Signal Process. Lett..
[16]
Anand Arokia Raj.
A VOICE INTERFACE FOR THE VISUALLY IMPAIRED
,
2005
.
[17]
Akhilesh Tiwari,et al.
Debauchee's Wavelet Analysis of Speech Signal of Different Speakers for Similar Speech Set
,
2014
.
[18]
S. Mallat.
A wavelet tour of signal processing
,
1998
.
[19]
Michael Picheny,et al.
Large-Vocabulary Speech Recognition Algorithms
,
2002,
Computer.
[20]
Pravin Yannawar,et al.
Marathi Isolated Word Recognition System using MFCC and DTW Features
,
2011
.