Investigation of MFCC feature representation for classification of spoken letters using Multi-Layer Perceptrons (MLP)

In this paper, the Mel-Frequency Cepstral Coefficient (MFCC) is demonstrated as an effective feature representation method for spoken letters recognition. The Multi-Layer Perceptron (MLP) was used as a classifier to discriminate between two spoken letters - ‘A’ and ‘S’. The dataset consists of 72 samples (35 and 37 samples of spoken letters ‘A’ and ‘S’, respectively). The samples were represented using the Mel Frequency Cepstral Coefficients (MFCC). Several experiments were conducted to determine the optimal network parameters to yield the best classification results. The results indicate that the optimal network structure was with 2 hidden units, which yielded classification accuracy of 100% (training) and 93% (testing).

[1]  Richard J. Mammone,et al.  Speaker independent vowel recognition using neural tree networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[2]  Michel S. Nakhla,et al.  A high-order temporal neural network for word recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Shi-Huang Chen,et al.  Speaker Verification Using MFCC and Support Vector Machine , 2022 .

[4]  B. L. Yoon,et al.  Artificial neural network technology , 1989, SGSM.

[5]  Anssi Klapuri,et al.  TUT Acoustic Event Detection System 2007 , 2007, CLEAR.

[6]  Rohilah Sahak,et al.  Mel-frequency cepstrum coefficient analysis of infant cry with hypothyroidism , 2009, 2009 5th International Colloquium on Signal Processing & Its Applications.

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[9]  Donald Shaul Williamson,et al.  Automatic Music Similarity Assessment and Recommendation , 2007 .

[10]  Rohilah Sahak,et al.  Classification of infant cries with hypothyroidism using Multilayer Perceptron neural network , 2009, 2009 IEEE International Conference on Signal and Image Processing Applications.

[11]  John G. Harris,et al.  Increased mfcc filter bandwidth for noise-robust phoneme recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Jhing-Fa Wang,et al.  A Mandarin Speech Dictation System Based on Neural Network and Language Processing Model , 1994, IEEE International Conference on Consumer Electronics.

[13]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[14]  Ronald A. Cole,et al.  Speaker-independent recognition of spoken English letters , 1990, 1990 IJCNN International Joint Conference on Neural Networks.