On the relevance of auditory-based Gabor features for deep learning in robust speech recognition
暂无分享,去创建一个
[1] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[2] Roy D. Patterson,et al. In Auditory Physiology and Perception , 1992 .
[3] R. Patterson,et al. Complex Sounds and Auditory Images , 1992 .
[4] John R. Gilbert,et al. Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..
[5] R. Plomp,et al. Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.
[6] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[7] R. Plomp,et al. Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.
[8] Richard Lippmann,et al. Speech recognition by machines and humans , 1997, Speech Commun..
[9] Hynek Hermansky,et al. On properties of modulation spectrum for robust automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[10] Misha Pavel,et al. On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..
[11] J Tchorz,et al. A model of auditory perception as front end for automatic speech recognition. , 1999, The Journal of the Acoustical Society of America.
[12] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.
[13] David Gelbart,et al. Improving word accuracy with Gabor feature extraction , 2002, INTERSPEECH.
[14] C. Schreiner,et al. Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. , 2003, Journal of neurophysiology.
[15] Naveen Parihar,et al. Performance analysis of the Aurora large vocabulary baseline system , 2004, 2004 12th European Signal Processing Conference.
[16] Hynek Hermansky,et al. Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.
[17] Martin Cooke,et al. A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.
[18] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[19] D. Poeppel,et al. Multi-Time Resolution Analysis of Speech , 2007 .
[20] Odette Scharenborg,et al. Reaching over the gap: A review of efforts to link human and automatic speech recognition research , 2007, Speech Commun..
[21] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[22] Tony Ezzat,et al. Spectro-temporal analysis of speech using 2-d Gabor filters , 2007, INTERSPEECH.
[23] Stephen V. David,et al. Representation of Phonemes in Primary Auditory Cortex: How the Brain Analyzes Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[24] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[25] Nelson Morgan,et al. Multi-stream spectro-temporal features for robust speech recognition , 2008, INTERSPEECH.
[26] Wu Chou,et al. Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.
[27] Richard M. Stern,et al. Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction , 2009, INTERSPEECH.
[28] Jon Barker,et al. Robust automatic transcription of English speech corpora , 2010, 2010 8th International Conference on Communications.
[29] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[30] Dong Yu,et al. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[31] Birger Kollmeier,et al. Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition , 2011, Speech Commun..
[32] Tara N. Sainath,et al. Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Tara N. Sainath,et al. Making Deep Belief Networks effective for large vocabulary continuous speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[34] Marc René Schädler,et al. Comparing Different Flavors of Spectro-Temporal Features for ASR , 2011, INTERSPEECH.
[35] Geoffrey E. Hinton,et al. Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Yu Hu,et al. Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.
[37] B. Kollmeier,et al. Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. , 2012, The Journal of the Acoustical Society of America.
[38] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[39] Birger Kollmeier,et al. Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition , 2012, INTERSPEECH.
[40] Richard M. Stern,et al. Features Based on Auditory Physiology and Perception , 2012, Techniques for Noise Robustness in Automatic Speech Recognition.
[41] Bernd T. Meyer,et al. Spectro-temporal Gabor features for speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.
[43] Jon Barker,et al. The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[44] Tim Jürgens,et al. NOISE ROBUST DISTANT AUTOMATIC SPEECH RECOGNITION UTILIZING NMF BASED SOURCE SEPARATION AND AUDITORY FEATURE EXTRACTION , 2013 .
[45] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[46] Nelson Morgan,et al. Robust CNN-based speech recognition with Gabor filter kernels , 2014, INTERSPEECH.
[47] Yun Lei,et al. Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions , 2014, INTERSPEECH.
[48] Sriram Ganapathy,et al. Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering. , 2014, The Journal of the Acoustical Society of America.
[49] Yifan Gong,et al. An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[50] Niko Moritz,et al. Should deep neural nets have ears? the role of auditory features in deep learning approaches , 2014, INTERSPEECH.
[51] Björn W. Schuller,et al. Investigating NMF speech enhancement for neural network based acoustic models , 2014, INTERSPEECH.
[52] Vaibhava Goel,et al. Annealed dropout training of deep networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[53] Chengzhu Yu,et al. The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[54] Chng Eng Siong,et al. Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[55] Hugo Van hamme,et al. Investigating modulation spectrogram features for deep neural network-based automatic speech recognition , 2015, INTERSPEECH.
[56] Hyung Soon Kim,et al. Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition , 2015 .
[57] Steven Greenberg,et al. Multi-time resolution analysis of speech: evidence from psychophysics , 2015, Front. Neurosci..
[58] A. Al‐Jallad. Phonology and Phonetics , 2015 .
[59] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[60] N. C. England,et al. Phonology and Phonetics , 2017 .