VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
暂无分享,去创建一个
[1] Yong Man Ro,et al. Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Ashish Sardana,et al. Speech Prediction in Silent Videos Using Variational Autoencoders , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Hung-yi Lee,et al. Again-VC: A One-Shot Voice Conversion Using Activation Guidance and Adaptive Instance Normalization , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Bin Yang,et al. Feature Selection Using Batch-Wise Attenuation and Feature Mask Normalization , 2020, 2021 International Joint Conference on Neural Networks (IJCNN).
[5] Yong Man Ro,et al. Speech Reconstruction With Reminiscent Sound Via Visual Voice Memory , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[6] C. V. Jawahar,et al. Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Jesper Jensen,et al. Vocoder-Based Speech Synthesis from Silent Videos , 2020, INTERSPEECH.
[8] Maja Pantic,et al. Video-Driven Speech Reconstruction using Generative Adversarial Networks , 2019, INTERSPEECH.
[9] Ning Gui,et al. AFS: An Attention-based mechanism for Supervised Feature Selection , 2019, AAAI.
[10] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Maja Pantic,et al. End-to-End Speech-Driven Facial Animation with Temporal GANs , 2018, BMVC.
[13] Jan Kautz,et al. Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.
[14] Liangliang Cao,et al. Lip2Audspec: Speech Reconstruction from Silent Lip Movements Video , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Shmuel Peleg,et al. Improved Speech Reconstruction from Silent Video , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[16] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[19] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[20] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[21] Shmuel Peleg,et al. Vid2speech: Speech reconstruction from silent video , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[23] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[24] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[25] Jesper Jensen,et al. An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[26] Yuxiao Hu,et al. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.
[27] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Debaditya Roy,et al. Feature selection using Deep Neural Networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).
[29] Wyeth W. Wasserman,et al. Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.
[30] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.
[33] Ben P. Milner,et al. Reconstructing intelligible audio speech from visual speech features , 2015, INTERSPEECH.
[34] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[35] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.
[36] Douglas Burnham,et al. Hearing Eye II : The Psychology Of Speechreading And Auditory-Visual Speech , 2013 .
[37] Jesper Jensen,et al. A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[38] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[39] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[40] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..
[41] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[42] Tsuhan Chen,et al. Audiovisual speech processing , 2001, IEEE Signal Process. Mag..
[43] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.