暂无分享,去创建一个
[1] Christopher Krügel,et al. VENOMAVE: Clean-Label Poisoning Against Speech Recognition , 2020, ArXiv.
[2] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[3] Max Welling,et al. Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.
[4] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[6] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2020, LREC.
[7] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[8] Margret Keuper,et al. Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Siwei Lyu,et al. Exposing DeepFake Videos By Detecting Face Warping Artifacts , 2018, CVPR Workshops.
[10] Tomi Kinnunen,et al. ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech , 2021, IEEE Transactions on Biometrics, Behavior, and Identity Science.
[11] Hemlata Tak,et al. End-to-end anti-spoofing with RawNet2 , 2020 .
[12] Mukund Sundararajan,et al. Attribution in Scale and Space , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Kainan Peng,et al. WaveFlow: A Compact Flow-based Model for Raw Audio , 2020, ICML.
[14] Claude E. Shannon,et al. Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..
[15] Bill McCarty. The Honeynet Arms Race , 2003, IEEE Secur. Priv..
[16] Tomoki Toda,et al. Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Xu Zhang,et al. Detecting and Simulating Artifacts in GAN Fake Images , 2019, 2019 IEEE International Workshop on Information Forensics and Security (WIFS).
[18] Mani B. Srivastava,et al. Did you hear that? Adversarial Examples Against Automatic Speech Recognition , 2018, ArXiv.
[19] Wei Chen,et al. Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech , 2020, ArXiv.
[20] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[21] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[22] Thomas Quatieri,et al. Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .
[23] Gunnar Rätsch,et al. Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs , 2017, ArXiv.
[24] Hayit Greenspan,et al. GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification , 2018, Neurocomputing.
[25] Brendan J. Frey,et al. Generating and designing DNA with deep generative models , 2017, ArXiv.
[26] Cristian Canton Ferrer,et al. The DeepFake Detection Challenge (DFDC) Dataset. , 2020 .
[27] Luisa Verdoliva,et al. Do GANs Leave Artificial Fingerprints? , 2018, 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).
[28] Asja Fischer,et al. Leveraging Frequency Analysis for Deep Fake Image Recognition , 2020, ICML.
[29] Lu Sheng,et al. Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues , 2020, ECCV.
[30] Jie Yang,et al. VoiceLive: A Phoneme Localization based Liveness Detection for Voice Authentication on Smartphones , 2016, CCS.
[31] Erich Elsen,et al. End-to-End Adversarial Text-to-Speech , 2020, ArXiv.
[32] Zhen-Hua Ling,et al. A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[34] Brian A. Carter,et al. Advanced Encryption Standard , 2007 .
[35] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[36] D. W. Robinson,et al. Psychoacoustics—facts and models , 1991 .
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[39] Jun Ho Huh,et al. Void: A fast and light voice liveness detection system , 2020, USENIX Security Symposium.
[40] Madhu R. Kamble,et al. Effectiveness of Speech Demodulation-Based Features for Replay Detection , 2018, INTERSPEECH.
[41] Prasenjit Dey,et al. End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention , 2018, INTERSPEECH.
[42] Hany Farid,et al. Evading Deepfake-Image Detectors with White- and Black-Box Attacks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[43] C. K. Yuen,et al. Theory and Application of Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.
[44] Madhu R. Kamble,et al. Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection , 2017, INTERSPEECH.
[45] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[46] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[47] Tomi Kinnunen,et al. A comparison of features for synthetic speech detection , 2015, INTERSPEECH.
[48] Wei Ping,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.
[49] D. Scheuermann,et al. Usability of Biometrics in Relation to Electronic Signatures , 2000 .
[50] Shinnosuke Takamichi,et al. JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis , 2017, ArXiv.
[51] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[52] Clifford Odets Papers. Guide to the , 2003 .
[53] B. S. Manjunath,et al. Detecting GAN generated Fake Images using Co-occurrence Matrices , 2019, Media Watermarking, Security, and Forensics.
[54] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[55] Sungwon Kim,et al. FloWaveNet : A Generative Flow for Raw Audio , 2018, ICML.
[56] Elaine B. Barker. Guideline for using cryptographic standards in the federal government: , 2016 .
[57] Tomi Kinnunen,et al. ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection , 2019, INTERSPEECH.
[58] Dorothea Kolossa,et al. Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding , 2018, NDSS.
[59] Davide Cozzolino,et al. Detection of GAN-Generated Fake Images over Social Networks , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).
[60] K.M.M. Prabhu,et al. Window Functions and Their Applications in Signal Processing , 2013 .
[61] Dorothea Kolossa,et al. Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[62] Simon S. Woo,et al. GAN is a friend or foe?: a framework to detect various fake face images , 2019, SAC.
[63] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[64] Simon King,et al. Attentive Filtering Networks for Audio Replay Attack Detection , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[65] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[66] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[67] Chris Donahue,et al. Adversarial Audio Synthesis , 2018, ICLR.
[68] Steffen Zeiler,et al. Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems , 2019, ACSAC.
[69] Kevin Duh,et al. ESPnet-ST: All-in-One Speech Translation Toolkit , 2020, ACL.
[70] Mario Fritz,et al. Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[71] Michael Hamburg,et al. Meltdown: Reading Kernel Memory from User Space , 2018, USENIX Security Symposium.
[72] Kong-Aik Lee,et al. RedDots replayed: A new replay spoofing attack corpus for text-dependent speaker verification research , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[73] Aleksandr Sizov,et al. ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge , 2017, IEEE Journal of Selected Topics in Signal Processing.
[74] Yue Zhao,et al. CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition , 2018, USENIX Security Symposium.
[75] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[76] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[77] Galina Lavrentyeva,et al. Audio Replay Attack Detection with Deep Learning Frameworks , 2017, INTERSPEECH.
[78] Vern Paxson,et al. Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.
[79] Xin Wang,et al. Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[80] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2021, ICLR.
[81] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[82] Thomas S. Huang,et al. A fast two-dimensional median filtering algorithm , 1979 .
[83] Andrew Owens,et al. CNN-Generated Images Are Surprisingly Easy to Spot… for Now , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[84] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.
[85] Zhuo Chen,et al. ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[86] Bolin Chen,et al. Fake Faces Identification via Convolutional Neural Network , 2018, IH&MMSec.
[87] Michael Hamburg,et al. Spectre Attacks: Exploiting Speculative Execution , 2018, 2019 IEEE Symposium on Security and Privacy (SP).
[88] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.
[89] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[90] Patrick Traynor,et al. SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems , 2020, 2021 IEEE Symposium on Security and Privacy (SP).
[91] Saniat Javid Sohrawardi,et al. Recurrent Convolutional Structures for Audio Spoof and Video Deepfake Detection , 2020, IEEE Journal of Selected Topics in Signal Processing.
[92] Aleksander Madry,et al. On Evaluating Adversarial Robustness , 2019, ArXiv.
[93] Hye-jin Shim,et al. Improved RawNet with Feature Map Scaling for Text-Independent Speaker Verification Using Raw Waveforms , 2020, INTERSPEECH.
[94] Wei Ping,et al. Non-Autoregressive Neural Text-to-Speech , 2020, ICML.
[95] Andreas Rössler,et al. FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[96] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[97] Honggang Qi,et al. Celeb-DF: A New Dataset for DeepFake Forensics , 2019, ArXiv.
[98] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[99] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[100] Rafael Valle,et al. TequilaGAN: How to easily identify GAN samples , 2018, ArXiv.
[101] Scott McCloskey,et al. Detecting GAN-generated Imagery using Color Cues , 2018, ArXiv.
[102] David A. Wagner,et al. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).