Robust Feature Vector Set Using Higher Order Autocorrelation Coefficients

In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses only the higher-order autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-order autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further by the Mel filter bank; a log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the Differentiated Relative Higher Order Autocorrelation Coefficient Sequence Spectrum DRHOASS. The authors evaluate the speech recognition performance of the DRHOASS features and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech.

[1]  Yingxu Wang,et al.  The Cognitive Informatics Theory and Mathematical Models of Visual Information Processing in the Brain , 2009, Int. J. Cogn. Informatics Nat. Intell..

[2]  Abeer Alwan,et al.  Robust word recognition using threaded spectral peaks , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Abeer Alwan,et al.  Noise robust feature extraction for ASR using the Aurora 2 database , 2001, INTERSPEECH.

[4]  D. Glasspool The Integration and Control of Behaviour: Insights from Neuroscience and AI , 2005 .

[5]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[6]  Biing-Hwang Juang,et al.  The short-time modified coherence representation and noisy speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  K. R. Ramakrishnan,et al.  Spectral maxima representation for robust automatic speech recognition , 2003, INTERSPEECH.

[8]  Yingxu Wang Cognitive Informatics for Revealing Human Cognition: Knowledge Manipulations in Natural Intelligence , 2012 .

[9]  Kenji Sugawara,et al.  Concept of Symbiotic Computing and its Agent-Based Application to a Ubiquitous Care-Support Service , 2009, Int. J. Cogn. Informatics Nat. Intell..

[10]  Yingxu Wang Transdisciplinary Advancements in Cognitive Mechanisms and Human Information Processing , 2011 .

[11]  Louis D. Braida,et al.  Human and machine consonant recognition , 2005, Speech Commun..

[12]  Louis C. W. Pols Flexible, robust and efficient human speech processing versus present-day speech technology. , 1999 .

[13]  Hsiao-Chuan Wang,et al.  Robust features for noisy speech recognition based on temporal trajectory filtering of short-time autocorrelation sequences , 1999, Speech Commun..

[14]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[15]  Yingxu Wang,et al.  An Operational Semantics of Real-Time Process Algebra (RTPA) , 2008, Int. J. Cogn. Informatics Nat. Intell..

[16]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[17]  Hsiang-Chuan Liu,et al.  Scaling Behavior of Maximal Repeat Distributions in Genomic Sequences , 2008, Int. J. Cogn. Informatics Nat. Intell..

[18]  Guoyin Wang,et al.  Important Attributes Selection Based on Rough Set for Speech Emotion Recognition , 2009, Int. J. Cogn. Informatics Nat. Intell..

[19]  Shiaofen Fang,et al.  Interactive feature visualization and detection for 3D face classification , 2010, 9th IEEE International Conference on Cognitive Informatics (ICCI'10).

[20]  Yingxu Wang Discoveries and Breakthroughs in Cognitive Informatics and Natural Intelligence (Advances in Cognitive Informatics and Natural Intelligence (Acini) Book Series) , 2009 .

[21]  Chong Kwan Un,et al.  Speech recognition in noisy environments using first-order vector Taylor series , 1998, Speech Commun..

[22]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[23]  Mukund Padmanabhan Spectral peak tracking and its use in speech recognition , 2000, INTERSPEECH.

[24]  Jeih-Weih Hung,et al.  Improved robust speech recognition considering signal correlation approximated by taylor series , 1998, ICSLP.

[25]  Hanseok Ko,et al.  A novel spectral subtraction scheme for robust speech recognition: spectral subtraction using spectral harmonics of speech , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[26]  Satoshi Nakamura,et al.  Cepstrum derived from differentiated power spectrum for robust speech recognition , 2003, Speech Commun..

[27]  Gustavo Abib,et al.  Organizational and technological implications of cognitive machines: designing future information management systems , 2011 .

[28]  D. Davis Visions of mind : architectures for cognition and affect , 2005 .

[29]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[30]  Kuldip K. Paliwal,et al.  Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition , 2006, Speech Commun..

[31]  Shaochun Xu,et al.  Constructivist Learning During Software Development , 2007, Int. J. Cogn. Informatics Nat. Intell..

[32]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[33]  Steven Kay,et al.  The effects of noise on the autoregressive spectral estimator , 1979 .

[34]  Abeer Alwan,et al.  A model of dynamic auditory perception and its application to robust word recognition , 1997, IEEE Trans. Speech Audio Process..

[35]  William Zhu,et al.  Approximations in Rough Sets vs Granular Computing for Coverings , 2010, Int. J. Cogn. Informatics Nat. Intell..

[36]  Yingxu Wang On Cognitive Informatics , 2003 .

[37]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[38]  Climent Nadeu,et al.  Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition , 1997, IEEE Trans. Speech Audio Process..

[39]  Yingxu Wang Developments in Natural Intelligence Research and Knowledge Engineering: Advancing Applications , 2012 .

[40]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[41]  Phil Turner,et al.  Exploration of Space, Technology, and Spatiality: Interdisciplinary Perspectives , 2008 .

[42]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition , 1996 .