Robust Noisy Speech Parameterization Using Convolutional Neural Networks

[1]  Ji Wu,et al.  Denoising deep neural networks based voice activity detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Ryan Prenger,et al.  Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[4]  Jing Pang Spectrum energy based voice activity detection , 2017, 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC).

[5]  Gabriel Synnaeve,et al.  Wav2Letter: an End-to-End ConvNet-based Speech Recognition System , 2016, ArXiv.

[6]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[7]  Thad Hughes,et al.  Recurrent neural networks for voice activity detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Tara N. Sainath,et al.  Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection , 2016, INTERSPEECH.

[9]  Mark Liberman,et al.  Speech activity detection on youtube using deep neural networks , 2013, INTERSPEECH.

[10]  Ted Painter,et al.  Audio Signal Processing and Coding , 2007 .

[11]  Cassia Valentini-Botinhao,et al.  Noisy speech database for training speech enhancement algorithms and TTS models , 2017 .

[12]  Jun Du,et al.  A universal VAD based on jointly trained deep neural networks , 2015, INTERSPEECH.

[13]  Björn W. Schuller,et al.  Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Hyeontaek Lim,et al.  Formant-Based Robust Voice Activity Detection , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.