Deep Neural Network Based Speech Recognition Systems Under Noise Perturbations

Automatic speech recognition, which plays an important role in human-computer interactions, is the cornerstone of communication between human and smart devices. In the past few years, deep neural networks (DNNs) have been deployed in automatic speech recognition with great success. However, recent research has discovered that DNNs are not robust against small perturbations. In this work, we investigate the capability of noise immunity in various neural network models through the speech recognition task. When the noise is introduced into the original speech audio, our experimental results demonstrate that the phoneme error rate (PER) degrades as the signal-to-noise ratio (SNR) reduces across all evaluated neural network models. On the other hand, when the noise is introduced into the Mel-frequency cepstral coefficient (MFCC) features, the multilayer perceptron (MLP) network model outperforms all other recurrent neural network (RNN) models.

[1]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[2]  Yang Yi,et al.  Realizing Behavior Level Associative Memory Learning Through Three-Dimensional Memristor-Based Neuromorphic Circuits , 2021, IEEE Transactions on Emerging Topics in Computational Intelligence.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Titouan Parcollet,et al.  The Pytorch-kaldi Speech Recognition Toolkit , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Mani B. Srivastava,et al.  Did you hear that? Adversarial Examples Against Automatic Speech Recognition , 2018, ArXiv.

[6]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[7]  Sridha Sridharan,et al.  The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms , 2010, INTERSPEECH.

[8]  Yang Yi,et al.  Deep-DFR: A Memristive Deep Delayed Feedback Reservoir Computing System with Hybrid Neural Network Topology , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[9]  Steve Renals,et al.  IPA: improved phone modelling with recurrent neural networks , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Yoshua Bengio,et al.  Light Gated Recurrent Units for Speech Recognition , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Bin Wang,et al.  FPGA based spike-time dependent encoder and reservoir design in neuromorphic computing processors , 2016, Microprocess. Microsystems.

[13]  Lingjia Liu,et al.  A Training-Efficient Hybrid-Structured Deep Neural Network With Reconfigurable Memristive Synapses , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[16]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Jinyu Li,et al.  Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks. , 2013, ICLR 2013.

[18]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[19]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[22]  Daniel Povey,et al.  Revisiting Recurrent Neural Networks for robust ASR , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Scott Craver,et al.  Additive attacks on speaker recognition , 2014, Electronic Imaging.

[24]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[25]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[26]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.