Integration of Optimized Modulation Filter Sets Into Deep Neural Networks for Automatic Speech Recognition
暂无分享,去创建一个
[1] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[2] Keith Vertanen. Baseline Wsj Acoustic Models for Htk and Sphinx : Training Recipes and Recognition Experiments , 2007 .
[3] B. Kollmeier,et al. Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.
[4] Hervé Bourlard,et al. Mel-cepstrum modulation spectrum (MCMS) features for robust ASR , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[5] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.
[6] Benoît Favre,et al. Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers? , 2014, INTERSPEECH.
[7] Sarel van Vuuren,et al. Data based filter design for RASTA-like channel normalization in ASR , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[8] Petros Maragos,et al. Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..
[9] S. S. Stevens,et al. Critical Band Width in Loudness Summation , 1957 .
[10] T. Houtgast. Frequency selectivity in amplitude-modulation detection. , 1989, The Journal of the Acoustical Society of America.
[11] Hermann Ney,et al. Deep hierarchical bottleneck MRASTA features for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[12] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[13] A. Aertsen,et al. The Spectro-Temporal Receptive Field , 1981, Biological Cybernetics.
[14] Steve Renals,et al. WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[15] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[16] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[17] Richard M. Stern,et al. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[18] Misha Pavel,et al. On the importance of various modulation frequencies for speech recognition , 1997, EUROSPEECH.
[19] Xiaohui Zhang,et al. Improving deep neural network acoustic models using generalized maxout networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.
[22] Jon Barker,et al. The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[23] Steven Greenberg,et al. INSIGHTS INTO SPOKEN LANGUAGE GLEANED FROM PHONETIC TRANSCRIPTION OF THE SWITCHBOARD CORPUS , 1996 .
[24] D. Grantham,et al. Modulation masking: effects of modulation frequency, depth, and phase. , 1989, The Journal of the Acoustical Society of America.
[25] Yifan Gong,et al. An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[26] B. Kollmeier,et al. Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. , 1994, The Journal of the Acoustical Society of America.
[27] C. Schreiner,et al. Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. , 2003, Journal of neurophysiology.
[28] C. Schreiner,et al. Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. , 1988, Journal of neurophysiology.
[29] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[30] Thomas Hofmann,et al. Greedy Layer-Wise Training of Deep Networks , 2007 .
[31] Hynek Hermansky,et al. Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.
[32] Michael Kleinschmidt,et al. Localized spectro-temporal features for automatic speech recognition , 2003, INTERSPEECH.
[33] Arindam Mandal,et al. Normalized amplitude modulation features for large vocabulary noise-robust speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Daniel P. W. Ellis,et al. Frequency-domain linear prediction for temporal features , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[35] J. Foote,et al. WSJCAM0: A BRITISH ENGLISH SPEECH CORPUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , 1995 .
[36] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[37] Hermann Ney,et al. Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[38] B. Kollmeier,et al. Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. , 2012, The Journal of the Acoustical Society of America.
[39] I. McCowan,et al. The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..
[40] Hynek Hermansky,et al. Temporal patterns (TRAPs) in ASR of noisy speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[41] Biing-Hwang Juang,et al. An application of discriminative feature extraction to filter-bank-based speech recognition , 2001, IEEE Trans. Speech Audio Process..
[42] Tara N. Sainath,et al. Learning filter banks within a deep neural network framework , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[43] N. Viemeister. Temporal modulation transfer functions based upon modulation thresholds. , 1979, The Journal of the Acoustical Society of America.
[44] Hynek Hermansky,et al. Temporal envelope compensation for robust phoneme recognition using modulation spectrum. , 2010, The Journal of the Acoustical Society of America.
[45] Ron J. Weiss,et al. Speech acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[47] Sarel van Vuuren,et al. Data-driven design of RASTA-like filters , 1997, EUROSPEECH.
[48] Frédéric E. Theunissen,et al. The Modulation Transfer Function for Speech Intelligibility , 2009, PLoS Comput. Biol..
[49] Birger Kollmeier,et al. An Auditory Inspired Amplitude Modulation Filter Bank for Robust Feature Extraction in Automatic Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[50] Hermann Ney,et al. Acoustic modeling with deep neural networks using raw time signal for LVCSR , 2014, INTERSPEECH.
[51] Birger Kollmeier,et al. Amplitude modulation spectrogram based features for robust speech recognition in noisy and reverberant environments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[52] Steven Greenberg,et al. Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..