On the relevance of auditory-based Gabor features for deep learning in robust speech recognition

[1]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[2]  Roy D. Patterson,et al.  In Auditory Physiology and Perception , 1992 .

[3]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[4]  John R. Gilbert,et al.  Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[5]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[6]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[7]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[8]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[9]  Hynek Hermansky,et al.  On properties of modulation spectrum for robust automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Misha Pavel,et al.  On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..

[11]  J Tchorz,et al.  A model of auditory perception as front end for automatic speech recognition. , 1999, The Journal of the Acoustical Society of America.

[12]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[13]  David Gelbart,et al.  Improving word accuracy with Gabor feature extraction , 2002, INTERSPEECH.

[14]  C. Schreiner,et al.  Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. , 2003, Journal of neurophysiology.

[15]  Naveen Parihar,et al.  Performance analysis of the Aurora large vocabulary baseline system , 2004, 2004 12th European Signal Processing Conference.

[16]  Hynek Hermansky,et al.  Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.

[17]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[18]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[19]  D. Poeppel,et al.  Multi-Time Resolution Analysis of Speech , 2007 .

[20]  Odette Scharenborg,et al.  Reaching over the gap: A review of efforts to link human and automatic speech recognition research , 2007, Speech Commun..

[21]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Tony Ezzat,et al.  Spectro-temporal analysis of speech using 2-d Gabor filters , 2007, INTERSPEECH.

[23]  Stephen V. David,et al.  Representation of Phonemes in Primary Auditory Cortex: How the Brain Analyzes Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[24]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[25]  Nelson Morgan,et al.  Multi-stream spectro-temporal features for robust speech recognition , 2008, INTERSPEECH.

[26]  Wu Chou,et al.  Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[27]  Richard M. Stern,et al.  Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction , 2009, INTERSPEECH.

[28]  Jon Barker,et al.  Robust automatic transcription of English speech corpora , 2010, 2010 8th International Conference on Communications.

[29]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[30]  Dong Yu,et al.  Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[31]  Birger Kollmeier,et al.  Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition , 2011, Speech Commun..

[32]  Tara N. Sainath,et al.  Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Tara N. Sainath,et al.  Making Deep Belief Networks effective for large vocabulary continuous speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[34]  Marc René Schädler,et al.  Comparing Different Flavors of Spectro-Temporal Features for ASR , 2011, INTERSPEECH.

[35]  Geoffrey E. Hinton,et al.  Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Yu Hu,et al.  Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[37]  B. Kollmeier,et al.  Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. , 2012, The Journal of the Acoustical Society of America.

[38]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[39]  Birger Kollmeier,et al.  Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition , 2012, INTERSPEECH.

[40]  Richard M. Stern,et al.  Features Based on Auditory Physiology and Perception , 2012, Techniques for Noise Robustness in Automatic Speech Recognition.

[41]  Bernd T. Meyer,et al.  Spectro-temporal Gabor features for speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[43]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Tim Jürgens,et al.  NOISE ROBUST DISTANT AUTOMATIC SPEECH RECOGNITION UTILIZING NMF BASED SOURCE SEPARATION AND AUDITORY FEATURE EXTRACTION , 2013 .

[45]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[46]  Nelson Morgan,et al.  Robust CNN-based speech recognition with Gabor filter kernels , 2014, INTERSPEECH.

[47]  Yun Lei,et al.  Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions , 2014, INTERSPEECH.

[48]  Sriram Ganapathy,et al.  Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering. , 2014, The Journal of the Acoustical Society of America.

[49]  Yifan Gong,et al.  An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[50]  Niko Moritz,et al.  Should deep neural nets have ears? the role of auditory features in deep learning approaches , 2014, INTERSPEECH.

[51]  Björn W. Schuller,et al.  Investigating NMF speech enhancement for neural network based acoustic models , 2014, INTERSPEECH.

[52]  Vaibhava Goel,et al.  Annealed dropout training of deep networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[53]  Chengzhu Yu,et al.  The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[54]  Chng Eng Siong,et al.  Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[55]  Hugo Van hamme,et al.  Investigating modulation spectrogram features for deep neural network-based automatic speech recognition , 2015, INTERSPEECH.

[56]  Hyung Soon Kim,et al.  Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition , 2015 .

[57]  Steven Greenberg,et al.  Multi-time resolution analysis of speech: evidence from psychophysics , 2015, Front. Neurosci..

[58]  A. Al‐Jallad Phonology and Phonetics , 2015 .

[59]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[60]  N. C. England,et al.  Phonology and Phonetics , 2017 .