Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach

Automatic Speech Recognition (ASR) is a technology for identifying uttered word(s) represented as an acoustic signal. However, one of the important aspects of a noise-robust ASR system is its ability to recognise speech accurately in noisy conditions. This paper studies the applications of Multi-Nets Artificial Neural Networks (M-N ANNs), a realisation of multiple-views multiple-learners approach, as Multi-Networks Speech Recognisers (M-NSRs) in providing a real-time, frequency-based noise-robust ASR model. M-NSRs define speech features associated with each word as a different view and apply a standalone ANN as one of the learners to approximate that view; meanwhile, multiple-views single-learner (MVSL) ANN-based speech recognisers employ only one ANN to memorise the features of the entire vocabulary. In this research, an M-NSR was provided and evaluated using unforeseen test data that were affected by white, brown, and pink noises; more specifically, 27 experiments were conducted on noisy speech to measure the accuracy and recognition rate of the proposed model. Furthermore, the results of the M-NSR were compared in detail with an MVSL ANN-based ASR system. The M-NSR recorded an improved average recognition rate by up to 20.14% when it was given the test data infected with noise in our experiments. It is shown that the M-NSR with higher degree of generalisability can handle frequency-based noise because it has higher recognition rate than the previous model under noisy conditions.

[1]  Murat Hüsnü Sazli,et al.  Speech recognition with artificial neural networks , 2010, Digit. Signal Process..

[2]  Rajesh Rajamani,et al.  Directional cancellation of acoustic noise for home window applications , 2013 .

[3]  C.-Y. Chang,et al.  Active Noise Cancellation Without Secondary Path Identification by Using an Adaptive Genetic Algorithm , 2010, IEEE Transactions on Instrumentation and Measurement.

[4]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  P. Loizou Introduction to cochlear implants. , 1999, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[6]  Seyyed Ali Seyyedsalehi,et al.  Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks , 2009, Neural Computing and Applications.

[7]  Hui Jiang,et al.  Confidence measures for speech recognition: A survey , 2005, Speech Commun..

[8]  Richard M. Stern,et al.  A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition , 2004, Speech Commun..

[9]  Douglas D. O'Shaughnessy,et al.  Invited paper: Automatic speech recognition: History, methods and challenges , 2008, Pattern Recognit..

[10]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[11]  Dominique Pastor,et al.  On the Recognition of Cochlear Implant-Like Spectrally Reduced Speech With MFCC and HMM-Based ASR , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Gerhard Schmidt,et al.  Signal processing for in-car communication systems , 2006, Signal Process..

[13]  Hua Nong Ting,et al.  Speaker-Independent Vowel Recognition for Malay Children Using Time-Delay Neural Network , 2011 .

[14]  Richard M. Stern,et al.  Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise , 2011, Speech Commun..

[15]  John H. L. Hansen,et al.  Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Shiliang Sun,et al.  Multiple-View Multiple-Learner Semi-Supervised Learning , 2011, Neural Processing Letters.

[17]  Sridha Sridharan,et al.  The use of phase in complex spectrum subtraction for robust speech recognition , 2011, Comput. Speech Lang..

[18]  Siti Zaiton Mohd Hashim,et al.  Artificial neural networks as multi-networks automated test oracle , 2011, Automated Software Engineering.

[19]  Chip-Hong Chang,et al.  Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[20]  Amita Dev Effect of retroflex sounds on the recognition of Hindi voiced and unvoiced stops , 2008, AI & SOCIETY.

[21]  Dominique Pastor,et al.  A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech , 2012, Speech Commun..

[22]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[23]  Farshad Almasganj,et al.  Reconstruction of missing features by means of multivariate Laplace distribution (MLD) for noise robust speech recognition , 2011, Expert Syst. Appl..

[24]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[25]  Boaz Rafaely,et al.  Microphone Array Signal Processing , 2008 .

[26]  Robert J. Schalkoff,et al.  Artificial neural networks , 1997 .

[27]  Siti Zaiton Mohd Hashim,et al.  An automated framework for software test oracle , 2011, Inf. Softw. Technol..

[28]  Bert Cranen,et al.  Sparse imputation for large vocabulary noise robust ASR , 2011, Comput. Speech Lang..

[29]  Jean Paul Haton,et al.  Distributed TDNN-Fuzzy Vector Quantization for HMM speech recognition , 2009, 2009 International Conference on Multimedia Computing and Systems.

[30]  Seyyed Ali Seyyedsalehi,et al.  Nonlinear enhancement of noisy speech, using continuous attractor dynamics formed in recurrent neural networks , 2011, Neurocomputing.

[31]  Shiliang Sun,et al.  Multiple-view multiple-learner active learning , 2010, Pattern Recognit..

[32]  John H. L. Hansen,et al.  Variational noise model composition through model perturbation for robust speech recognition with time-varying background noise , 2011, Speech Commun..

[33]  Francesco Nesta,et al.  Blind source extraction for robust speech recognition in multisource noisy environments , 2013, Comput. Speech Lang..

[34]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[35]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[36]  Shyam S. Agrawal,et al.  Categorization of Hindi phonemes by neural networks , 2003, AI & SOCIETY.

[37]  Yu Gong,et al.  A robust hybrid feedback active noise cancellation headset , 2005, IEEE Transactions on Speech and Audio Processing.

[38]  Hyung-Min Park,et al.  Robust speech recognition based on independent vector analysis using harmonic frequency dependency , 2012, Neural Computing and Applications.

[39]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[40]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..