Feature space reduction in ethnically diverse Malaysian English accents classification

In this paper we propose a reduced dimensional space of statistical descriptors of mel-bands spectral energy (MBSE) vectors for accent classification of Malaysian English (MalE) speakers caused by diverse ethnics. Principle component analysis (PCA) with eigenvector decomposition approach was utilized to project this high-dimensional dataset into uncorrelated space through the interesting covariance structure of a set of variables. This delimitates the size of feature vector necessary for good classification task once significant coordinate system is revealed. The objectives of this paper have three-fold. Firstly, to generate reduced size feature vector in order to decrease the memory requirement and the computational time. Secondly, to improve the classification accuracy. Thirdly, to replace the state-of-the-art mel-frequency cepstral coefficients (MFCC) method that is more susceptible to noisy environment. The system was designed using K-nearest neighbors algorithm and evaluated on 20% independent test dataset. The proposed PCA-transformed mel-bands spectral energy (PCA-MBSE) on MalE database has proven to be more efficient in terms of space and robust over the baselines MFCC and MBSE. PCA-MBSE achieved the same accuracy as the original MBSE at 66.67% reduced feature vector and tested to be superiorly robust under various noisy conditions with only 10.48% drop in the performance as compared to 16.81% and 48.01% using MBSE and MFCC respectively.

[1]  E. Schneider Postcolonial English: Varieties around the World , 2007 .

[2]  Sungyoung Lee,et al.  PCA-based human auditory filter bank for speech recognition , 2004, 2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04..

[3]  Angel R. Martinez,et al.  Computational Statistics Handbook with MATLAB, Second Edition (Chapman & Hall/Crc Computer Science & Data Analysis) , 2007 .

[4]  Tao Chen,et al.  Analysis of Speaker Variability , 2022 .

[5]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[6]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Kah Phooi Seng,et al.  Audio-Emotion Recognition System Using Parallel Classifiers and Audio Feature Analyzer , 2011, 2011 Third International Conference on Computational Intelligence, Modelling & Simulation.

[9]  Magnus Rosell An Introduction to Front-End Processing and Acoustic Features for Automatic Speech Recognition , 2006 .

[10]  Angel R. Martinez,et al.  Computational Statistics Handbook with MATLAB , 2001 .

[11]  Shanta Nair-Venugopal,et al.  English, identity and the Malaysian workplace , 2000 .

[12]  Tsang-Long Pao,et al.  Audio-Visual Speech Recognition with Weighted KNN-based Classification in Mandarin Database , 2007 .

[13]  Edgar W. Schneider,et al.  Postcolonial English: Contents , 2007 .

[14]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[15]  J. N. Gowdy,et al.  Feature extraction using discrete wavelet transform for speech recognition , 2000, Proceedings of the IEEE SoutheastCon 2000. 'Preparing for The New Millennium' (Cat. No.00CH37105).

[16]  Lindsay I. Smith,et al.  A tutorial on Principal Components Analysis , 2002 .

[17]  Tsang-Long Pao,et al.  Audio-Visual Speech Recognition with Weighted KNN-based Classification in Mandarin Database , 2007, Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007).