Audio Deepfake Detection System with Neural Stitching for ADD 2022
暂无分享,去创建一个
Xiangang Li | Wei Zou | Tingwei Guo | Shuran Zhou | Rui Yan | Cheng Wen
[1] Haizhou Li,et al. ADD 2022: the first Audio Deep Synthesis Detection Challenge , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Vandana P. Janeja,et al. How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey , 2021, ArXiv.
[3] Zhen-Hua Ling,et al. Adversarial Voice Conversion Against Neural Spoofing Detectors , 2021, Interspeech.
[4] Bin Ma,et al. Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[6] Ganesh Sivaraman,et al. Generalization of Audio Deepfake Detection , 2020, Odyssey.
[7] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[8] Kaisheng Yao,et al. Multi-Resolution Multi-Head Attention in Deep Speaker Embedding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Yi Yang,et al. Random Erasing Data Augmentation , 2017, AAAI.
[10] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Diganta Misra,et al. Mish: A Self Regularized Non-Monotonic Neural Activation Function , 2019, ArXiv.
[12] Pooyan Safari,et al. Self Multi-Head Attention for Speaker Recognition , 2019, INTERSPEECH.
[13] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[14] Tomi Kinnunen,et al. ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection , 2019, INTERSPEECH.
[15] Sanjeev Khudanpur,et al. Spoken Language Recognition using X-vectors , 2018, Odyssey.
[16] Ming Li,et al. A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Koichi Shinoda,et al. Attentive Statistics Pooling for Deep Speaker Embedding , 2018, INTERSPEECH.
[18] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[19] Daniel Povey,et al. MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.
[20] Andrea Vedaldi,et al. Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.
[21] Tomi Kinnunen,et al. A comparison of features for synthetic speech detection , 2015, INTERSPEECH.
[22] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[23] Barbara G Shinn-Cunningham,et al. Localizing nearby sound sources in a classroom: binaural room impulse responses. , 2005, The Journal of the Acoustical Society of America.