Auditory-based robust speech recognition system for ambient assisted living in smart home

An auditory-based feature extraction algorithm is proposed for enhancing the robustness of automatic speech recognition. In the proposed approach, the speech signal is characterized using a new feature referred to as the Basilar-membrane Frequency-band Cepstral Coefficient (BFCC). In contrast to the conventional Mel-Frequency Cepstral Coefficient (MFCC) method based on a Fourier spectrogram, the proposed BFCC method uses an auditory spectrogram based on a gammachirp wavelet transform in order to more accurately mimic the auditory response of the human ear and improve the noise immunity. In addition, a Hidden Markov Model (HMM) is used for both training and testing purposes. The evaluation results obtained using the AURORA 2 noisy speech database show that compared to the MFCC method, the proposed scheme improves the speech recognition rate by 15% on average given speech samples with Siganl-to-Noise Ratios (SNRs) ranging from 0 to 20 dB. Thus, the proposed method has significant potential for the development of robust speech recognition systems for ambient assisted living.

[1]  Po-Hsun Sung,et al.  Computer-Assisted Auscultation: Patent Ductus Arteriosus Detection Based on Auditory Time–frequency Analysis , 2015 .

[2]  Eric Campo,et al.  A review of smart homes - Present state and future challenges , 2008, Comput. Methods Programs Biomed..

[3]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[4]  A. Mertins,et al.  Vocal tract length invariant features for automatic speech recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[5]  Oliver Chiu-sing Choy,et al.  An efficient MFCC extraction method in speech recognition , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[6]  Louis L. Scharf,et al.  A Multistage Representation of the Wiener Filter Based on Orthogonal Projections , 1998, IEEE Trans. Inf. Theory.

[7]  Qi Li,et al.  An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Qi Li,et al.  An auditory-based transfrom for audio signal processing , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[9]  Chandra Sekhar Seelamantula,et al.  Gammatone wavelet Cepstral Coefficients for robust speech recognition , 2013, 2013 IEEE International Conference of IEEE Region 10 (TENCON 2013).

[10]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[11]  Michel Vacher,et al.  Development of Audio Sensing Technology for Ambient Assisted Living: Applications and Challenges , 2011, Int. J. E Health Medical Commun..

[12]  DeLiang Wang,et al.  Incorporating Auditory Feature Uncertainties in Robust Speaker Identification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Jorg Kliewer,et al.  The complex-valued continuous wavelet transform as a preprocessor for auditory scene analysis , 1998 .

[14]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.