Temporal Modeling Using Dilated Convolution and Gating for Voice-Activity-Detection
暂无分享,去创建一个
Tara N. Sainath | Oriol Vinyals | Aäron van den Oord | Bo Li | Gabor Simko | Shuo-Yiin Chang | Anshuman Tripathi | Oriol Vinyals | Bo Li | Shuo-yiin Chang | Anshuman Tripathi | Gabor Simko | O. Vinyals
[1] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[2] Matt Shannon,et al. Improved End-of-Query Detection for Streaming Speech Recognition , 2017, INTERSPEECH.
[3] Navdeep Jaitly,et al. Speech recognition for medical conversations , 2017, INTERSPEECH.
[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Tara N. Sainath,et al. Highway-LSTM and Recurrent Highway Networks for Speech Recognition , 2017, INTERSPEECH.
[6] P. Dutilleux. An Implementation of the “algorithme à trous” to Compute the Wavelet Transform , 1989 .
[7] Björn W. Schuller,et al. Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[10] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..
[11] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[12] Tara N. Sainath,et al. Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection , 2016, INTERSPEECH.
[13] Brian Kingsbury,et al. Improvements to the IBM speech activity detection system for the DARPA RATS program , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Tara N. Sainath,et al. Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition , 2017, INTERSPEECH.
[15] Tara N. Sainath,et al. Improvements to Deep Convolutional Neural Networks for LVCSR , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[16] Fei Xie,et al. A comparative study of speech detection methods , 1997, EUROSPEECH.
[17] Yun Lei,et al. All for one: feature combination for highly channel-degraded speech activity detection , 2013, INTERSPEECH.
[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[19] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[20] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[21] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[22] Ph. Tchamitchian,et al. Wavelets: Time-Frequency Methods and Phase Space , 1992 .
[23] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[24] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[25] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.