暂无分享,去创建一个
Ye Jia | Tal Remez | Brendan Shillingford | Michelle Tadmor Ramanovich | Michael Hassid | Miaosen Wang | Brendan Shillingford | Tal Remez | Ye Jia | Michael Hassid | Miaosen Wang
[1] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[2] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[3] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[4] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[5] Tomohiro Nakatani,et al. A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments , 2008, Speech Commun..
[6] Abeer Alwan,et al. Reducing F0 Frame Error of F0 tracking algorithms under noisy conditions with an unvoiced/voiced classification frontend , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[7] V. Tiwari. MFCC and its applications in speaker recognition , 2010 .
[8] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[9] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[12] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[13] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[14] Shmuel Peleg,et al. Vid2speech: Speech reconstruction from silent video , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Joon Son Chung,et al. Lip Reading in Profile , 2017, BMVC.
[16] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[17] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[20] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[21] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[22] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[23] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.
[24] Yoshua Bengio,et al. ObamaNet: Photo-realistic lip-sync from text , 2017, ArXiv.
[25] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Lei He,et al. Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS , 2019, INTERSPEECH.
[27] Rohit Jain,et al. Lipper: Synthesizing Thy Speech using Multi-View Lipreading , 2019, AAAI.
[28] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] C. V. Jawahar,et al. Towards Automatic Face-to-Face Translation , 2019, ACM Multimedia.
[30] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[31] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[32] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[33] Jonathan Le Roux,et al. Universal Sound Separation , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[34] Adam Finkelstein,et al. Text-based editing of talking-head video , 2019, ACM Trans. Graph..
[35] C. V. Jawahar,et al. Cross-language Speech Dependent Lip-synchronization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Hans-Peter Seidel,et al. Neural style-preserving visual dubbing , 2019, ACM Trans. Graph..
[37] Lior Wolf,et al. Attention-based Wavenet Autoencoder for Universal Voice Conversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Thomas Paine,et al. Large-Scale Visual Speech Recognition , 2018, INTERSPEECH.
[39] Zhen-Hua Ling,et al. Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[41] Quoc V. Le,et al. Improved Noisy Student Training for Automatic Speech Recognition , 2020, INTERSPEECH.
[42] C. V. Jawahar,et al. A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild , 2020, ACM Multimedia.
[43] C. V. Jawahar,et al. Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Misha Denil,et al. Large-scale multilingual audio visual dubbing , 2020, ArXiv.
[45] Heiga Zen,et al. Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling , 2020, ArXiv.
[46] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[47] Shaojin Ding,et al. Textual Echo Cancellation , 2020, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[48] Haizhou Li,et al. Visualtts: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Heiga Zen,et al. Parallel Tacotron: Non-Autoregressive and Controllable TTS , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[50] Bryan Catanzaro,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.
[51] Simon King,et al. An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[52] Vivek Kwatra,et al. LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2020, ICLR.
[54] Ye Jia,et al. Translatotron 2: Robust direct speech-to-speech translation , 2021, ArXiv.
[55] Yonghui Wu,et al. PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS , 2021, Interspeech.
[56] Yuxuan Wang,et al. Neural Dubber: Dubbing for Silent Videos According to Scripts , 2021, ArXiv.
[57] Heiga Zen,et al. Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling , 2021, Interspeech.
[58] Marco Tagliasacchi,et al. SoundStream: An End-to-End Neural Audio Codec , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[59] Chen Change Loy,et al. Everybody’s Talkin’: Let Me Talk as You Want , 2020, IEEE Transactions on Information Forensics and Security.
[60] Maja Pantic,et al. End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks , 2021, IEEE transactions on cybernetics.