Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition
暂无分享,去创建一个
[1] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Richard M. Stern,et al. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[3] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[4] John H. L. Hansen,et al. Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition , 2012, INTERSPEECH.
[5] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[6] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[9] Yoshua Bengio,et al. Interpretable Convolutional Filters with SincNet , 2018, ArXiv.
[10] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[12] Sriram Ganapathy,et al. End-to-end Language Recognition Using Attention Based Hierarchical Gated Recurrent Unit Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[14] Ke Li,et al. A Time-Restricted Self-Attention Layer for ASR , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Tara N. Sainath,et al. Learning filter banks within a deep neural network framework , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[16] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.
[17] Ron J. Weiss,et al. Speech acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Tomas Mikolov,et al. RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .
[19] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[20] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[21] R. Patterson,et al. B OF THE SVOS FINAL REPORT ( Part A : The Auditory Filterbank ) AN EFFICIENT AUDITORY FIL TERBANK BASED ON THE GAMMATONE FUNCTION , 2010 .
[22] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[23] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[24] Sriram Ganapathy,et al. Unsupervised Raw Waveform Representation Learning for ASR , 2019, INTERSPEECH.
[25] Hermann Ney,et al. Acoustic modeling with deep neural networks using raw time signal for LVCSR , 2014, INTERSPEECH.
[26] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.
[27] Dimitri Palaz,et al. Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks , 2013, INTERSPEECH.
[28] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .