On the Trustworthiness Landscape of State-of-the-art Generative Models: A Comprehensive Survey
暂无分享,去创建一个
[1] Seong Joon Oh,et al. ProPILE: Probing Privacy Leakage in Large Language Models , 2023, ArXiv.
[2] S. Engelhardt,et al. Investigating Data Memorization in 3D Latent Diffusion Models for Medical Image Synthesis , 2023, DGM4MICCAI.
[3] Pang Wei Koh,et al. Are aligned neural networks adversarially aligned? , 2023, NeurIPS.
[4] P. Rokita,et al. Towards More Realistic Membership Inference Attacks on Large Diffusion Models , 2023, ArXiv.
[5] Changhua Meng,et al. On the Robustness of Latent Diffusion Models , 2023, ArXiv.
[6] Pin-Yu Chen,et al. VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models , 2023, NeurIPS.
[7] Taylor Berg-Kirkpatrick,et al. Membership Inference Attacks against Language Models via Neighbourhood Comparison , 2023, ACL.
[8] Kaidi Xu,et al. An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization , 2023, ArXiv.
[9] Shuming Shi,et al. Deepfake Text Detection in the Wild , 2023, ArXiv.
[10] Chaowei Xiao,et al. ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger , 2023, ArXiv.
[11] Vishvak S. Murahari,et al. Toxicity in ChatGPT: Analyzing Persona-assigned Language Models , 2023, EMNLP.
[12] Sijia Liu,et al. A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[13] Mohit Iyyer,et al. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense , 2023, ArXiv.
[14] Vinu Sankar Sadasivan,et al. Can AI-Generated Text be Reliably Detected? , 2023, ArXiv.
[15] Henrique Pondé de Oliveira Pinto,et al. GPT-4 Technical Report , 2023, 2303.08774.
[16] D. Song,et al. TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] A. Dragan,et al. Automatically Auditing Large Language Models via Discrete Optimization , 2023, ICML.
[18] Chen Chen,et al. A Pathway Towards Responsible AI Generated Content , 2023, IJCAI.
[19] Sahar Abdelnabi,et al. Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection , 2023, AISec@CCS.
[20] A. Madry,et al. Raising the Cost of Malicious AI-Powered Image Editing , 2023, ICML.
[21] Haibing Guan,et al. Adversarial Example Does Good: Preventing Painting Imitation from Diffusion Models via Adversarial Examples , 2023, ICML.
[22] Min Lin,et al. Bag of Tricks for Training Data Extraction from Language Models , 2023, ICML.
[23] Naoto Yanai,et al. Membership Inference Attacks against Diffusion Models , 2023, 2023 IEEE Security and Privacy Workshops (SPW).
[24] Shiqi Wang,et al. Are Diffusion Models Vulnerable to Membership Inference Attacks? , 2023, ICML.
[25] Shruti Tople,et al. Analyzing Leakage of Personally Identifiable Information in Language Models , 2023, 2023 IEEE Symposium on Security and Privacy (SP).
[26] Florian Tramèr,et al. Extracting Training Data from Diffusion Models , 2023, USENIX Security Symposium.
[27] Jun Pang,et al. Membership Inference of Diffusion Models , 2023, ArXiv.
[28] Jonathan Katz,et al. A Watermark for Large Language Models , 2023, ICML.
[29] Tsung-Yi Ho,et al. How to Backdoor Diffusion Models? , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] T. Goldstein,et al. Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] F'abio Perez,et al. Ignore Previous Prompt: Attack Techniques For Language Models , 2022, ArXiv.
[32] Lukas Struppek,et al. Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis , 2022, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Shi-You Xu. CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning , 2022, ArXiv.
[34] M. Backes,et al. Membership Inference Attacks Against Text-to-image Generation Models , 2022, ArXiv.
[35] Florian Tramèr,et al. Measuring Forgetting of Memorized Training Examples , 2022, ICLR.
[36] D. Baron,et al. Gradient Obfuscation Gives a False Sense of Security in Federated Learning , 2022, USENIX Security Symposium.
[37] K. Chang,et al. Are Large Pre-Trained Language Models Leaking Your Personal Information? , 2022, EMNLP.
[38] Frank Wood,et al. Flexible Diffusion Modeling of Long Videos , 2022, NeurIPS.
[39] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[40] C. Chakrabarti,et al. ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[42] Song Guo,et al. Protect Privacy from Gradient Leakage Attack in Federated Learning , 2022, IEEE INFOCOM 2022 - IEEE Conference on Computer Communications.
[43] Hao Li,et al. Adversarial Attack and Defense: A Survey , 2022, Electronics.
[44] Prafulla Dhariwal,et al. Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.
[45] Tom B. Brown,et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.
[46] M. Lewis,et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.
[47] Florian Tramèr,et al. Quantifying Memorization Across Neural Language Models , 2022, ICLR.
[48] Colin Raffel,et al. Deduplicating Training Data Mitigates Privacy Risks in Language Models , 2022, ICML.
[49] Geoffrey Irving,et al. Red Teaming Language Models with Language Models , 2022, EMNLP.
[50] Noah A. Smith,et al. Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection , 2022, EMNLP.
[51] Renelito Delos Santos,et al. LaMDA: Language Models for Dialog Applications , 2022, ArXiv.
[52] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53] D. Lischinski,et al. Blended Diffusion for Text-driven Editing of Natural Images , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54] David J. Fleet,et al. Palette: Image-to-Image Diffusion Models , 2021, SIGGRAPH.
[55] Martin T. Vechev,et al. Bayesian Framework for Gradient Leakage , 2021, ICLR.
[56] Jungseul Ok,et al. Gradient Inversion with Generative Image Prior , 2021, NeurIPS.
[57] Vinay Uday Prabhu,et al. Multimodal datasets: misogyny, pornography, and malignant stereotypes , 2021, ArXiv.
[58] Po-Sen Huang,et al. Challenges in Detoxifying Language Models , 2021, EMNLP.
[59] Owain Evans,et al. TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.
[60] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[61] Xipeng Qiu,et al. Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning , 2021, EMNLP.
[62] A. E. Cicek,et al. UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks against Split Learning , 2021, IACR Cryptol. ePrint Arch..
[63] Hiroaki Hayashi,et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..
[64] Zhiyuan Liu,et al. Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution , 2021, ACL.
[65] Zhiyuan Liu,et al. Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger , 2021, ACL.
[66] David J. Fleet,et al. Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[67] Byron C. Wallace,et al. Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? , 2021, NAACL.
[68] Pavlo Molchanov,et al. See through Gradients: Image Batch Recovery via GradInversion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[69] Douwe Kiela,et al. Gradient-based Adversarial Attacks against Text Transformers , 2021, EMNLP.
[70] Hongsheng Hu,et al. Membership Inference Attacks on Machine Learning: A Survey , 2021, ACM Comput. Surv..
[71] Hang Liu,et al. TAG: Gradient Attack on Transformer-based Language Models , 2021, EMNLP.
[72] Timo Schick,et al. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP , 2021, Transactions of the Association for Computational Linguistics.
[73] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[74] D. Klein,et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.
[75] Miles Brundage,et al. Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models , 2021, ArXiv.
[76] John Pavlopoulos,et al. Civil Rephrases Of Toxic Texts With Self-Supervised Transformers , 2021, EACL.
[77] Yejin Choi,et al. Challenges in Automated Debiasing for Toxic Language Detection , 2021, EACL.
[78] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.
[79] Colin Raffel,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.
[80] Jingwei Sun,et al. Soteria: Provable Defense against Privacy Leakage in Federated Learning from Representation Perspective , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[81] G. Ateniese,et al. Unleashing the Tiger: Inference Attacks on Split Learning , 2020, CCS.
[82] Shangwei Guo,et al. Privacy-preserving Collaborative Learning with Automatic Transformation Search , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[83] Li Li,et al. A review of applications in federated learning , 2020, Comput. Ind. Eng..
[84] Samuel R. Bowman,et al. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.
[85] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.
[86] Kris McGuffie,et al. The Radicalization Risks of GPT-3 and Advanced Neural Language Models , 2020, ArXiv.
[87] Shafiq R. Joty,et al. GeDi: Generative Discriminator Guided Sequence Generation , 2020, EMNLP.
[88] Wenqi Wei,et al. A Framework for Evaluating Client Privacy Leakages in Federated Learning , 2020, ESORICS.
[89] D. Song,et al. Aligning AI With Shared Human Values , 2020, ICLR.
[90] Yong Jiang,et al. Backdoor Learning: A Survey , 2020, IEEE transactions on neural networks and learning systems.
[91] Vinay Uday Prabhu,et al. Large image datasets: A pyrrhic win for computer vision? , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[92] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[93] Michael Backes,et al. BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements , 2020, ACSAC.
[94] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[95] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[96] Siva Reddy,et al. StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.
[97] Graham Neubig,et al. Weight Poisoning Attacks on Pretrained Models , 2020, ACL.
[98] T. Rabin,et al. Falcon: Honest-Majority Maliciously Secure Framework for Private Deep Learning , 2020, Proc. Priv. Enhancing Technol..
[99] Siddhant Garg,et al. BAE: BERT-based Adversarial Examples for Text Classification , 2020, EMNLP.
[100] Michael Moeller,et al. Inverting Gradients - How easy is it to break privacy in federated learning? , 2020, NeurIPS.
[101] Timo Schick,et al. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.
[102] Bo Zhao,et al. iDLG: Improved Deep Leakage from Gradients , 2020, ArXiv.
[103] Chris Callison-Burch,et al. Human and Automatic Detection of Generated Text , 2019, ArXiv.
[104] S. Ermon,et al. Fair Generative Modeling via Weak Supervision , 2019, ICML.
[105] J. Yosinski,et al. Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2019, ICLR.
[106] Alec Radford,et al. Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.
[107] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.
[108] Joey Tianyi Zhou,et al. Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2019, AAAI.
[109] Wanxiang Che,et al. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.
[110] Song Han,et al. Deep Leakage from Gradients , 2019, NeurIPS.
[111] Alexander M. Rush,et al. GLTR: Statistical Detection and Visualization of Generated Text , 2019, ACL.
[112] Ingmar Weber,et al. Racial Bias in Hate Speech and Abusive Language Detection Datasets , 2019, Proceedings of the Third Workshop on Abusive Language Online.
[113] Kevin Duh,et al. Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System? , 2019, TACL.
[114] Eric Horvitz,et al. Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting , 2019, DGS@ICLR.
[115] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[116] Ramesh Raskar,et al. Split learning for health: Distributed deep learning without sharing raw patient data , 2018, ArXiv.
[117] Teresa K. O'Leary,et al. Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information: An Observational Study of Siri, Alexa, and Google Assistant , 2018, Journal of medical Internet research.
[118] Cícero Nogueira dos Santos,et al. Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer , 2018, ACL.
[119] Mani B. Srivastava,et al. Generating Natural Language Adversarial Examples , 2018, EMNLP.
[120] Brendan Dolan-Gavitt,et al. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.
[121] Sameep Mehta,et al. Towards Crafting Text Adversarial Samples , 2017, ArXiv.
[122] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[123] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.
[124] Adam S. Miner,et al. Smartphone-Based Conversational Agents and Responses to Questions About Mental Health, Interpersonal Violence, and Physical Health. , 2016, JAMA internal medicine.
[125] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.
[126] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[127] Donald L. Fisher,et al. Tuning collision warning algorithms to individual drivers for design of active safety systems , 2013, 2013 IEEE Globecom Workshops (GC Wkshps).
[128] T. Scheeren. Plug and Play , 2008 .
[129] Lukas Struppek,et al. Rickrolling the Artist: Injecting Invisible Backdoors into Text-Guided Image Generation Models , 2022, ArXiv.
[130] Weigang Wu,et al. Mixing Activations and Labels in Distributed Training for Split Learning , 2021, IEEE Transactions on Parallel and Distributed Systems.
[131] Moinuddin K. Qureshi,et al. Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning , 2021, ArXiv.
[132] Roma Patel,et al. “Was it “stated” or was it “claimed”?: How linguistic bias affects generative language models , 2021, EMNLP.
[133] Peng Li,et al. Rethinking Stealthiness of Backdoor Attack against NLP Models , 2021, ACL.
[134] Yang Liu,et al. BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning , 2020, USENIX ATC.
[135] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[136] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[137] Saied Alshahrani. Word-level Textual Adversarial Attacking as Combinatorial Optimization , 2022 .