论文信息 - On the relevance of auditory-based Gabor features for deep learning in robust speech recognition - 字舞流文

On the relevance of auditory-based Gabor features for deep learning in robust speech recognition

Sri Harish Reddy Mallidi | Bernd T. Meyer | Angel Mario Castro Martinez | B. Meyer

[1] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[2] Roy D. Patterson,et al. In Auditory Physiology and Perception , 1992 .

[3] R. Patterson,et al. Complex Sounds and Auditory Images , 1992 .

[4] John R. Gilbert,et al. Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[5] R. Plomp,et al. Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[6] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[7] R. Plomp,et al. Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[8] Richard Lippmann,et al. Speech recognition by machines and humans , 1997, Speech Commun..

[9] Hynek Hermansky,et al. On properties of modulation spectrum for robust automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10] Misha Pavel,et al. On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..

[11] J Tchorz,et al. A model of auditory perception as front end for automatic speech recognition. , 1999, The Journal of the Acoustical Society of America.

[12] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[13] David Gelbart,et al. Improving word accuracy with Gabor feature extraction , 2002, INTERSPEECH.

[14] C. Schreiner,et al. Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. , 2003, Journal of neurophysiology.

[15] Naveen Parihar,et al. Performance analysis of the Aurora large vocabulary baseline system , 2004, 2004 12th European Signal Processing Conference.

[16] Hynek Hermansky,et al. Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.

[17] Martin Cooke,et al. A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[18] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[19] D. Poeppel,et al. Multi-Time Resolution Analysis of Speech , 2007 .

[20] Odette Scharenborg,et al. Reaching over the gap: A review of efforts to link human and automatic speech recognition research , 2007, Speech Commun..

[21] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22] Tony Ezzat,et al. Spectro-temporal analysis of speech using 2-d Gabor filters , 2007, INTERSPEECH.

[23] Stephen V. David,et al. Representation of Phonemes in Primary Auditory Cortex: How the Brain Analyzes Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[24] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[25] Nelson Morgan,et al. Multi-stream spectro-temporal features for robust speech recognition , 2008, INTERSPEECH.

[26] Wu Chou,et al. Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[27] Richard M. Stern,et al. Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction , 2009, INTERSPEECH.

[28] Jon Barker,et al. Robust automatic transcription of English speech corpora , 2010, 2010 8th International Conference on Communications.

[29] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[30] Dong Yu,et al. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[31] Birger Kollmeier,et al. Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition , 2011, Speech Commun..

[32] Tara N. Sainath,et al. Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33] Tara N. Sainath,et al. Making Deep Belief Networks effective for large vocabulary continuous speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[34] Marc René Schädler,et al. Comparing Different Flavors of Spectro-Temporal Features for ASR , 2011, INTERSPEECH.

[35] Geoffrey E. Hinton,et al. Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36] Yu Hu,et al. Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[37] B. Kollmeier,et al. Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. , 2012, The Journal of the Acoustical Society of America.

[38] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[39] Birger Kollmeier,et al. Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition , 2012, INTERSPEECH.

[40] Richard M. Stern,et al. Features Based on Auditory Physiology and Perception , 2012, Techniques for Noise Robustness in Automatic Speech Recognition.

[41] Bernd T. Meyer,et al. Spectro-temporal Gabor features for speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[43] Jon Barker,et al. The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44] Tim Jürgens,et al. NOISE ROBUST DISTANT AUTOMATIC SPEECH RECOGNITION UTILIZING NMF BASED SOURCE SEPARATION AND AUDITORY FEATURE EXTRACTION , 2013 .

[45] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[46] Nelson Morgan,et al. Robust CNN-based speech recognition with Gabor filter kernels , 2014, INTERSPEECH.

[47] Yun Lei,et al. Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions , 2014, INTERSPEECH.

[48] Sriram Ganapathy,et al. Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering. , 2014, The Journal of the Acoustical Society of America.

[49] Yifan Gong,et al. An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[50] Niko Moritz,et al. Should deep neural nets have ears? the role of auditory features in deep learning approaches , 2014, INTERSPEECH.

[51] Björn W. Schuller,et al. Investigating NMF speech enhancement for neural network based acoustic models , 2014, INTERSPEECH.

[52] Vaibhava Goel,et al. Annealed dropout training of deep networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[53] Chengzhu Yu,et al. The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[54] Chng Eng Siong,et al. Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[55] Hugo Van hamme,et al. Investigating modulation spectrogram features for deep neural network-based automatic speech recognition , 2015, INTERSPEECH.

[56] Hyung Soon Kim,et al. Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition , 2015 .

[57] Steven Greenberg,et al. Multi-time resolution analysis of speech: evidence from psychophysics , 2015, Front. Neurosci..

[58] A. Al‐Jallad. Phonology and Phonetics , 2015 .

[59] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[60] N. C. England,et al. Phonology and Phonetics , 2017 .