A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system

Feature extraction in speech signals under the influence of background excitation is a challenging task. In this research, we propose phoneme subspace integrated with the linear visual assessment tendency (LVAT) algorithm to retrieve the audio feature based on spectral depth analysis. LVAT algorithm performs a clustering of different spectral features to define the intensity of signal weight. The Fast Fourier transform (FFT) projects selection of weight estimated samples from the signal for phoneme subspace. The FFT-phoneme subspace combination enhances the feature by analyzing the low, middle and high-frequency signals based on phone subspace weight update. Traditional feature extraction techniques like mel frequency cepstral coefficients, linear predictor cepstral coefficients and power normalized cepstral coefficients are analyzed under different noise conditions and compared with the results of clustering with power normalized cepstral coefficients. The experimental results demonstrate improvement in the performance by comparing the objective measures such as sensitivity, specificity, accuracy and recognition rate.

[1]  Nicu Sebe,et al.  Egocentric Daily Activity Recognition via Multitask Clustering , 2015, IEEE Transactions on Image Processing.

[2]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[3]  Hynek Hermansky,et al.  Perceptually based linear predictive analysis of speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Haoxiang Wang,et al.  An Effective Image Representation Method Using Kernel Classification , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[5]  Saeed Setayeshi,et al.  A scale-rate filter selection method in the spectro-temporal domain for phoneme classification , 2013, Comput. Electr. Eng..

[6]  Vikas Joshi,et al.  Sub-band based histogram equalization in cepstral domain for speech recognition , 2015, Speech Commun..

[7]  Kalle J. Palomäki,et al.  Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Xiaohui Yuan,et al.  Adaptive wavelet shrinkage for noise robust speaker recognition , 2014, Digit. Signal Process..

[9]  Astik Biswas,et al.  Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature , 2015, Comput. Electr. Eng..

[10]  Lan Wang,et al.  Automatic Complexity Control of Generalized Variable Parameter HMMs for Noise Robust Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Dorothea Kolossa,et al.  Learning Dynamic Stream Weights For Coupled-HMM-Based Audio-Visual Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Jean-Pierre Martens,et al.  Robust continuous digit recognition using Reservoir Computing , 2015, Comput. Speech Lang..

[13]  Richard M. Stern,et al.  Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Yang Li,et al.  Feature space generalized variable parameter HMMs for noise robust recognition , 2013, INTERSPEECH.

[15]  Wai Lok Woo,et al.  Wearable Audio Monitoring: Content-Based Processing Methodology and Implementation , 2014, IEEE Transactions on Human-Machine Systems.

[16]  Rajiv Saxena,et al.  Fractional Fourier transform: A novel tool for signal processing , 2013 .

[17]  Kotagiri Ramamohanarao,et al.  Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[18]  Yi-Hsuan Yang,et al.  Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Zoran A. Ivanovski,et al.  Kernel Power Flow Orientation Coefficients for Noise-Robust Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[21]  Birger Kollmeier,et al.  An Auditory Inspired Amplitude Modulation Filter Bank for Robust Feature Extraction in Automatic Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[23]  Yifan Gong,et al.  An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features–A Theoretically Consistent Approach , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Urmila Shrawankar,et al.  Techniques for Feature Extraction In Speech Recognition System : A Comparative Study , 2013, ArXiv.

[26]  Liang Tao,et al.  Unsupervised learning of phonemes of whispered speech in a noisy environment based on convolutive non-negative matrix factorization , 2014, Inf. Sci..

[27]  Hynek Hermansky,et al.  Robust Feature Extraction Using Modulation Filtering of Autoregressive Models , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[28]  Tara N. Sainath,et al.  Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Guihua Wen,et al.  Weighted spectral features based on local Hu moments for speech emotion recognition , 2015, Biomed. Signal Process. Control..

[30]  Alfred O. Hero,et al.  Clustering with a new distance measure based on a dual-rooted tree , 2013, Inf. Sci..

[31]  James C. Bezdek,et al.  An Efficient Formulation of the Improved Visual Assessment of Cluster Tendency (iVAT) Algorithm , 2012, IEEE Transactions on Knowledge and Data Engineering.

[32]  Yongqiang Wang,et al.  An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  M. Jakobsson,et al.  Clumpak: a program for identifying clustering modes and packaging population structure inferences across K , 2015, Molecular ecology resources.