Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
暂无分享,去创建一个
C. V. Jawahar | Vinay P. Namboodiri | Rudrabha Mukhopadhyay | K R Prajwal | Vinay Namboodiri | C V Jawahar | Prajwal K R | Rudrabha Mukhopadhyay
[1] P S Heckerling,et al. Communication with deaf patients. Knowledge, beliefs, and practices of physicians. , 1995, JAMA.
[2] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[3] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.
[4] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[5] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[6] Helen Margellos-Anast,et al. Developing a Standardized Comprehensive Health Survey for Use With Deaf Adults , 2005, American annals of the deaf.
[7] Shifeng Zhang,et al. S^3FD: Single Shot Scale-Invariant Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[9] Xiaogang Wang,et al. Vision-Infused Deep Audio Inpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[10] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[11] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Ben P. Milner,et al. Generating Intelligible Audio Speech From Visual Speech , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[13] Bonnie O'Day,et al. Communicating about Health Care: Observations from Persons Who Are Deaf or Hard of Hearing , 2004, Annals of Internal Medicine.
[14] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[15] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[18] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[19] Georgia Robins Sadler,et al. Communication strategies for nurses interacting with patients who are deaf. , 2007, Dermatology nursing.
[20] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[21] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[22] Christopher T Kello,et al. A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters. , 2004, The Journal of the Acoustical Society of America.
[23] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[24] Shmuel Peleg,et al. Improved Speech Reconstruction from Silent Video , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[25] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[26] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Liangliang Cao,et al. Lip2Audspec: Speech Reconstruction from Silent Lip Movements Video , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Shmuel Peleg,et al. Vid2speech: Speech reconstruction from silent video , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[30] Kai Xu,et al. LCANet: End-to-End Lipreading with Cascaded Attention-CTC , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[31] Stefan Wermter,et al. LipSound: Neural Mel-Spectrogram Reconstruction for Lip Reading , 2019, INTERSPEECH.
[32] Jesper Jensen,et al. An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Joon Son Chung,et al. Deep Lip Reading: a comparison of models and an online application , 2018, INTERSPEECH.
[34] D. Lewkowicz,et al. Infants deploy selective attention to the mouth of a talking face when learning speech , 2012, Proceedings of the National Academy of Sciences.
[35] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Maja Pantic,et al. Video-Driven Speech Reconstruction using Generative Adversarial Networks , 2019, INTERSPEECH.
[38] Jesper Jensen,et al. A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[39] Rohit Jain,et al. Lipper: Synthesizing Thy Speech using Multi-View Lipreading , 2019, AAAI.
[40] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[41] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.