Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
暂无分享,去创建一个
Xinhao Deng | Chuanpu Fu | Peiyang Li | Tianyu Cui | Yanling Wang | Yong Xiao | Sijia Li | Yunpeng Liu | Qinglin Zhang | Ziyi Qiu | Zhixing Tan | Junwu Xiong | Xinyu Kong | Zujie Wen | Ke Xu | Qi Li
[1] Banghua Zhu,et al. Towards Optimal Statistical Watermarking , 2023, ArXiv.
[2] Yinpeng Dong,et al. Evil Geniuses: Delving into the Safety of LLM-based Agents , 2023, ArXiv.
[3] Zhangyin Feng,et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions , 2023, ACM Transactions on Information Systems.
[4] D. Duvenaud,et al. Towards Understanding Sycophancy in Language Models , 2023, ArXiv.
[5] Cheongwoong Kang,et al. Impact of Co-occurrence on Factual Knowledge of Large Language Models , 2023, EMNLP.
[6] Zhangyin Feng,et al. Retrieval-Generation Synergy Augmented Large Language Models , 2023, ArXiv.
[7] Shangwei Guo,et al. Warfare:Breaking the Watermark Protection of AI-Generated Content , 2023, 2310.07726.
[8] Yufei Huang,et al. Large Language Model Alignment: A Survey , 2023, ArXiv.
[9] Trevor Darrell,et al. Aligning Large Multimodal Models with Factually Augmented RLHF , 2023, ArXiv.
[10] Xipeng Qiu,et al. The Rise and Potential of Large Language Model Based Agents: A Survey , 2023, ArXiv.
[11] James R. Glass,et al. DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models , 2023, ArXiv.
[12] Junyu Luo,et al. Zero-Resource Hallucination Prevention for Large Language Models , 2023, ArXiv.
[13] Deng Cai,et al. Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models , 2023, ArXiv.
[14] Wayne Xin Zhao,et al. A Survey on Large Language Model based Autonomous Agents , 2023, Frontiers Comput. Sci..
[15] Kai Sun,et al. Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? A.K.A. Will LLMs Replace Knowledge Graphs? , 2023, ArXiv.
[16] H. Niewiadomski,et al. Graph of Thoughts: Solving Elaborate Problems with Large Language Models , 2023, AAAI.
[17] Jimeng Sun,et al. MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models , 2023, ArXiv.
[18] Zhaopeng Tu,et al. GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher , 2023, ICLR.
[19] Jean-Francois Ton,et al. Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment , 2023, ArXiv.
[20] Yun Shen,et al. You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content , 2023, ArXiv.
[21] Quoc V. Le,et al. Simple synthetic data reduces sycophancy in large language models , 2023, ArXiv.
[22] M. Backes,et al. "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models , 2023, ArXiv.
[23] M. Backes,et al. Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing , 2023, ArXiv.
[24] Rodrigo Pedro,et al. From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application? , 2023, ArXiv.
[25] Ankit Pal,et al. Med-HALT: Medical Domain Hallucination Test for Large Language Models , 2023, CONLL.
[26] J. Z. Kolter,et al. Universal and Transferable Adversarial Attacks on Aligned Language Models , 2023, ArXiv.
[27] Jianbing Ni,et al. Unveiling Security, Privacy, and Ethical Concerns of ChatGPT , 2023, Journal of Information and Intelligence.
[28] Jingren Zhou,et al. CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility , 2023, ArXiv.
[29] Eric Michael Smith,et al. Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.
[30] J. Steinhardt,et al. Overthinking the Truth: Understanding how Language Models Process False Demonstrations , 2023, ArXiv.
[31] Neeraj Varshney,et al. A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation , 2023, ArXiv.
[32] J. Steinhardt,et al. Jailbroken: How Does LLM Safety Training Fail? , 2023, NeurIPS.
[33] Seong Joon Oh,et al. ProPILE: Probing Privacy Leakage in Large Language Models , 2023, ArXiv.
[34] Maanak Gupta,et al. From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy , 2023, IEEE Access.
[35] Houfeng Wang,et al. Preference Ranking Optimization for Human Alignment , 2023, AAAI.
[36] D. Song,et al. DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models , 2023, ArXiv.
[37] Lichao Sun,et al. TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models , 2023, ArXiv.
[38] Dylan Hadfield-Menell,et al. Explore, Establish, Exploit: Red Teaming Language Models from Scratch , 2023, ArXiv.
[39] Linbo Qiao,et al. Protecting User Privacy in Remote Conversational Systems: A Privacy-Preserving framework based on text sanitization , 2023, ArXiv.
[40] Tianwei Zhang,et al. Prompt Injection attack against LLM-integrated Applications , 2023, ArXiv.
[41] N. Gong,et al. PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts , 2023, ArXiv.
[42] Maosong Sun,et al. Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations , 2023, NeurIPS.
[43] Louis-Philippe Morency,et al. Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions , 2023, ACL.
[44] Sameer Singh,et al. MISGENDERED: Limits of Large Language Models in Understanding Pronouns , 2023, ACL.
[45] Thomas Lukasiewicz,et al. An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models , 2023, ACL.
[46] M. Wattenberg,et al. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model , 2023, NeurIPS.
[47] Lukas Pfahler,et al. Exposing Bias in Online Communities through Large-Scale Language Models , 2023, ArXiv.
[48] Yang Xu,et al. Knowledge of cultural moral norms in large language models , 2023, ACL.
[49] N. Imran,et al. Chat-GPT: Opportunities and Challenges in Child Mental Healthcare , 2023, Pakistan journal of medical sciences.
[50] Julien Launay,et al. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only , 2023, ArXiv.
[51] Lester W. Mackey,et al. Do Language Models Know When They’re Hallucinating References? , 2023, FINDINGS.
[52] Taylor Berg-Kirkpatrick,et al. Membership Inference Attacks against Language Models via Neighbourhood Comparison , 2023, ACL.
[53] Christopher D. Manning,et al. Direct Preference Optimization: Your Language Model is Secretly a Reward Model , 2023, NeurIPS.
[54] T. Luan,et al. A Survey on ChatGPT: AI–Generated Contents, Challenges, and Solutions , 2023, IEEE Open Journal of the Computer Society.
[55] M. Shanahan,et al. Role play with large language models , 2023, Nature.
[56] Minlie Huang,et al. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy , 2023, EMNLP.
[57] Luke Zettlemoyer,et al. Trusting Your Evidence: Hallucinate Less with Context-aware Decoding , 2023, NAACL.
[58] P. Charan,et al. From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads , 2023, ArXiv.
[59] Kelvin Guu,et al. PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions , 2023, ArXiv.
[60] Nicolas Papernot,et al. Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models , 2023, NeurIPS.
[61] Danqi Chen,et al. Enabling Large Language Models to Generate Text with Citations , 2023, EMNLP.
[62] Zhengzi Xu,et al. Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study , 2023, ArXiv.
[63] Pang Wei Koh,et al. FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation , 2023, EMNLP.
[64] Yilun Du,et al. Improving Factuality and Reasoning in Language Models through Multiagent Debate , 2023, ArXiv.
[65] Elena Sofia Ruzzetti,et al. A Trip Towards Fairness: Bias and De-Biasing in Large Language Models , 2023, ArXiv.
[66] Jonas Pfeiffer,et al. mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations , 2023, EMNLP.
[67] Ashish Sabharwal,et al. Improving Language Models via Plug-and-Play Retrieval Feedback , 2023, ArXiv.
[68] Shafiq R. Joty,et al. LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond , 2023, ArXiv.
[69] William Yang Wang,et al. Mitigating Language Model Hallucination with Interactive Question-Knowledge Alignment , 2023, ArXiv.
[70] Mohammad Javad Hosseini,et al. Sources of Hallucination by Large Language Models on Inference Tasks , 2023, EMNLP.
[71] K. Chang,et al. Quantifying Association Capabilities of Large Language Models and Its Implications on Privacy Leakage , 2023, ArXiv.
[72] A. Globerson,et al. LM vs LM: Detecting Factual Errors via Cross Examination , 2023, EMNLP.
[73] Noah A. Smith,et al. How Language Model Hallucinations Can Snowball , 2023, ArXiv.
[74] Dennis Aumiller,et al. Evaluating Factual Consistency of Texts with Semantic Role Labeling , 2023, STARSEM.
[75] Animesh Mukherjee,et al. Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection , 2023, ArXiv.
[76] Pinjia He,et al. BiasAsker: Measuring the Bias in Conversational AI System , 2023, ESEC/SIGSOFT FSE.
[77] Mustafa A. Mustafa,et al. A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation , 2023, Artif. Intell. Rev..
[78] Greg Durrett,et al. Complex Claim Verification with Evidence Retrieved in the Wild , 2023, ArXiv.
[79] Wayne Xin Zhao,et al. HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models , 2023, EMNLP.
[80] Weizhu Chen,et al. CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing , 2023, ICLR.
[81] Omer Levy,et al. LIMA: Less Is More for Alignment , 2023, NeurIPS.
[82] Andrew M. Dai,et al. PaLM 2 Technical Report , 2023, ArXiv.
[83] T. Griffiths,et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models , 2023, NeurIPS.
[84] Nghi D. Q. Bui,et al. CodeT5+: Open Code Large Language Models for Code Understanding and Generation , 2023, EMNLP.
[85] Hou Pong Chan,et al. Zero-shot Faithful Factual Error Correction , 2023, ACL.
[86] Xiangnan He,et al. Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation , 2023, RecSys.
[87] Huan Sun,et al. Automatic Evaluation of Attribution by Large Language Models , 2023, EMNLP.
[88] Zhixing Tan,et al. Privacy-Preserving Prompt Tuning for Large Language Model Services , 2023, ArXiv.
[89] Shafiq R. Joty,et al. Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework , 2023, ACL.
[90] J. Zhao,et al. Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models , 2023, EMNLP.
[91] Xueluan Gong,et al. D-DAE: Defense-Penetrating Model Extraction Attacks , 2023, 2023 IEEE Symposium on Security and Privacy (SP).
[92] Ravi Theja Gollapudi,et al. Control Flow and Pointer Integrity Enforcement in a Secure Tagged Architecture , 2023, 2023 IEEE Symposium on Security and Privacy (SP).
[93] Wajih Ul Hassan,et al. SoK: History is a Vast Early Warning System: Auditing the Provenance of System Intrusions , 2023, 2023 IEEE Symposium on Security and Privacy (SP).
[94] Chaowei Xiao,et al. ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger , 2023, ArXiv.
[95] Tom M. Mitchell,et al. The Internal State of an LLM Knows When its Lying , 2023, EMNLP.
[96] Haoming Jiang,et al. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond , 2023, ACM Trans. Knowl. Discov. Data.
[97] Hao Sun,et al. Safety Assessment of Chinese Large Language Models , 2023, ArXiv.
[98] A. Shashua,et al. Fundamental Limitations of Alignment in Large Language Models , 2023, ICML.
[99] Markus Pauly,et al. The Self-Perception and Political Biases of ChatGPT , 2023, Human Behavior and Emerging Technologies.
[100] Sunder Ali Khowaja,et al. ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A Review , 2023, Cognitive Computation.
[101] Vishvak S. Murahari,et al. Toxicity in ChatGPT: Analyzing Persona-assigned Language Models , 2023, EMNLP.
[102] Yangqiu Song,et al. Multi-step Jailbreaking Privacy Attacks on ChatGPT , 2023, EMNLP.
[103] Songfang Huang,et al. RRHF: Rank Responses to Align Language Models with Human Feedback without tears , 2023, NeurIPS.
[104] Yuqing Wang,et al. Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding , 2023, ArXiv.
[105] Emilio Ferrara. Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models , 2023, First Monday.
[106] Wayne Xin Zhao,et al. A Survey of Large Language Models , 2023, ArXiv.
[107] Daniel Hershcovich,et al. Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study , 2023, C3NLP.
[108] Michael J. Puett,et al. A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Wikipedia, and YouTube , 2023, ArXiv.
[109] Wensheng Gan,et al. AI-Generated Content (AIGC): A Survey , 2023, ArXiv.
[110] Vinu Sankar Sadasivan,et al. Can AI-Generated Text be Reliably Detected? , 2023, ArXiv.
[111] Bibhu Dash,et al. Impact of Big Data Analytics and ChatGPT on Cybersecurity , 2023, 2023 4th International Conference on Computing and Communication Systems (I3CS).
[112] M. Gales,et al. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models , 2023, EMNLP.
[113] Ari S. Morcos,et al. SemDeDup: Data-efficient learning at web-scale through semantic deduplication , 2023, ArXiv.
[114] Henrique Pondé de Oliveira Pinto,et al. GPT-4 Technical Report , 2023, 2303.08774.
[115] Stella Rose Biderman,et al. Eliciting Latent Predictions from Transformers with the Tuned Lens , 2023, ArXiv.
[116] David Ifeoluwa Adelani,et al. The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset , 2023, NeurIPS.
[117] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[118] Tianwei Zhang,et al. Aegis: Mitigating Targeted Bit-flip Attacks against Deep Neural Networks , 2023, USENIX Security Symposium.
[119] Michel Galley,et al. Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback , 2023, ArXiv.
[120] Sahar Abdelnabi,et al. Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection , 2023, AISec@CCS.
[121] Jindong Wang,et al. On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective , 2023, IEEE Data Eng. Bull..
[122] Lichao Sun,et al. BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT , 2023, ArXiv.
[123] Haewoon Kwak,et al. Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech , 2023, WWW.
[124] Carlos Guestrin,et al. Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks , 2023, ArXiv.
[125] Dan Su,et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity , 2023, IJCNLP.
[126] P. Abbeel,et al. Chain of Hindsight Aligns Language Models with Feedback , 2023, ArXiv.
[127] Yu-Neng Chuang,et al. The Science of Detecting LLM-Generated Texts , 2023, ArXiv.
[128] Samy I McFarlane,et al. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing , 2023, Cureus.
[129] Shruti Tople,et al. Analyzing Leakage of Personally Identifiable Information in Language Models , 2023, 2023 IEEE Symposium on Security and Privacy (SP).
[130] Y. Shoham,et al. In-Context Retrieval-Augmented Language Models , 2023, Transactions of the Association for Computational Linguistics.
[131] Ke Xu,et al. Detecting Unknown Encrypted Malicious Traffic in Real Time via Flow Interaction Graph Analysis , 2023, NDSS.
[132] Jonathan Katz,et al. A Watermark for Large Language Models , 2023, ICML.
[133] Jochen Hartmann,et al. The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation , 2023, SSRN Electronic Journal.
[134] Soroush Vosoughi,et al. Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits , 2023, NeurIPS.
[135] R. Das,et al. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories , 2022, ACL.
[136] Omar Shaikh,et al. On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning , 2022, ACL.
[137] Tom B. Brown,et al. Constitutional AI: Harmlessness from AI Feedback , 2022, ArXiv.
[138] F'abio Perez,et al. Ignore Previous Prompt: Attack Techniques For Language Models , 2022, ArXiv.
[139] Guillem Cucurull,et al. Galactica: A Large Language Model for Science , 2022, ArXiv.
[140] Colin Raffel,et al. Large Language Models Struggle to Learn Long-Tail Knowledge , 2022, ICML.
[141] Colin Raffel,et al. Evaluating the Factual Consistency of Large Language Models Through News Summarization , 2022, ACL.
[142] Jindong Wang,et al. GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective , 2022, ACL.
[143] M. Zaheer,et al. Large Language Models with Controllable Working Memory , 2022, ACL.
[144] Jiannong Cao,et al. StrongBox: A GPU TEE on Arm Endpoints , 2022, CCS.
[145] Xiang Lisa Li,et al. Contrastive Decoding: Open-ended Text Generation as Optimization , 2022, ACL.
[146] Arun Tejasvi Chaganty,et al. RARR: Researching and Revising What Language Models Say, Using Language Models , 2022, ACL.
[147] N. Japkowicz,et al. Machine-Generated Text: A Comprehensive Survey of Threat Models and Detection Methods , 2022, IEEE Access.
[148] Noah A. Smith,et al. Measuring and Narrowing the Compositionality Gap in Language Models , 2022, EMNLP.
[149] I. Shafran,et al. ReAct: Synergizing Reasoning and Acting in Language Models , 2022, ICLR.
[150] P. Zhang,et al. GLM-130B: An Open Bilingual Pre-trained Model , 2022, ICLR.
[151] Peter J. Liu,et al. Calibrating Sequence likelihood Improves Conditional Language Generation , 2022, ICLR.
[152] G. Karypis,et al. Differentially Private Bias-Term only Fine-tuning of Foundation Models , 2022, ArXiv.
[153] Lisa Anne Hendricks,et al. Improving alignment of dialogue agents via targeted human judgements , 2022, ArXiv.
[154] Tom B. Brown,et al. In-context Learning and Induction Heads , 2022, ArXiv.
[155] O. Hohlfeld,et al. IXP scrubber: learning from blackholing traffic for ML-driven DDoS detection at scale , 2022, SIGCOMM.
[156] Yoav Goldberg,et al. Measuring Causal Effects of Data Statistics on Language Model's 'Factual' Predictions , 2022, ArXiv.
[157] Mario Fritz,et al. RelaxLoss: Defending Membership Inference Attacks without Losing Utility , 2022, ICLR.
[158] Florian Tramèr,et al. Measuring Forgetting of Memorized Training Examples , 2022, ICLR.
[159] M. Shoeybi,et al. Factuality Enhanced Language Models for Open-Ended Text Generation , 2022, NeurIPS.
[160] R. Zemel,et al. Differentially Private Decoding in Large Language Models , 2022, ArXiv.
[161] David Evans,et al. Memorization in NLP Fine-tuning Methods , 2022, ArXiv.
[162] K. Chang,et al. Are Large Pre-Trained Language Models Leaking Your Personal Information? , 2022, EMNLP.
[163] Yau-Shian Wang,et al. Toxicity Detection with Generative Prompt-based Inference , 2022, ArXiv.
[164] Tom B. Brown,et al. Scaling Laws and Interpretability of Learning from Repeated Data , 2022, ArXiv.
[165] Eric Michael Smith,et al. “I’m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset , 2022, EMNLP.
[166] Issa M. Khalil,et al. SIRAJ: A Unified Framework for Aggregation of Malicious Entity Detectors , 2022, 2022 IEEE Symposium on Security and Privacy (SP).
[167] X. Koutsoukos,et al. Graphics Peeping Unit: Exploiting EM Side-Channel Information of GPUs to Eavesdrop on Your Neighbors , 2022, 2022 IEEE Symposium on Security and Privacy (SP).
[168] K. Shin,et al. SpecHammer: Combining Spectre and Rowhammer for New Speculative Attacks , 2022, 2022 IEEE Symposium on Security and Privacy (SP).
[169] R. Jia,et al. Just Fine-tune Twice: Selective Differential Privacy for Large Language Models , 2022, EMNLP.
[170] Stella Rose Biderman,et al. GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.
[171] Tom B. Brown,et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.
[172] M. Nagappan,et al. Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code? , 2022, Empirical Software Engineering.
[173] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[174] Chengjie Sun,et al. How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis , 2022, FINDINGS.
[175] S. Savarese,et al. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis , 2022, ICLR.
[176] D. Schuurmans,et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.
[177] Dipankar Ray,et al. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection , 2022, ACL.
[178] Jooyoung Lee,et al. Do Language Models Plagiarize? , 2022, WWW.
[179] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[180] Florian Tramèr,et al. Quantifying Memorization Across Neural Language Models , 2022, ICLR.
[181] Colin Raffel,et al. Deduplicating Training Data Mitigates Privacy Risks in Language Models , 2022, ICML.
[182] Florian Tramèr,et al. What Does it Mean for a Language Model to Preserve Privacy? , 2022, FAccT.
[183] Pascale Fung,et al. Survey of Hallucination in Natural Language Generation , 2022, ACM Comput. Surv..
[184] M. Shoeybi,et al. Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models , 2022, NeurIPS.
[185] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[186] Lingming Zhang,et al. Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).
[187] Fei Mi,et al. COLD: A Benchmark for Chinese Offensive Language Detection , 2022, EMNLP.
[188] Laurens van der Maaten,et al. Submix: Practical Private Prediction for Large-Scale Language Models , 2022, ArXiv.
[189] Jeff Wu,et al. WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.
[190] Alexander R. Fabbri,et al. QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization , 2021, NAACL.
[191] Yang Zhang,et al. Model Stealing Attacks Against Inductive Graph Neural Networks , 2021, 2022 IEEE Symposium on Security and Privacy (SP).
[192] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[193] Laxmi N. Bhuyan,et al. SmartWatch: accurate traffic analysis and flow-state tracking for intrusion prevention using SmartNICs , 2021, CoNEXT.
[194] Yufei Chen,et al. Property Inference Attacks Against GANs , 2021, NDSS.
[195] Anja Feldmann,et al. United We Stand: Collaborative Detection and Mitigation of Amplification DDoS Attacks at Scale , 2021, CCS.
[196] Zhe Gan,et al. Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models , 2021, NeurIPS Datasets and Benchmarks.
[197] Huseyin A. Inan,et al. Differentially Private Fine-tuning of Language Models , 2021, ICLR.
[198] Eitan Grinspun,et al. Can one hear the shape of a neural network?: Snooping the GPU via Magnetic Side Channel , 2021, USENIX Security Symposium.
[199] Po-Sen Huang,et al. Challenges in Detoxifying Language Models , 2021, EMNLP.
[200] Owain Evans,et al. TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.
[201] R. Jia,et al. Selective Differential Privacy for Language Modeling , 2021, NAACL.
[202] Andreas Vlachos,et al. A Survey on Automated Fact-Checking , 2021, TACL.
[203] Nicholas Carlini,et al. Deduplicating Training Data Makes Language Models Better , 2021, ACL.
[204] Qi Li,et al. Realtime Robust Malicious Traffic Detection via Frequency Domain Analysis , 2021, CCS.
[205] Christy Dennison,et al. Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets , 2021, NeurIPS.
[206] Zhiyuan Liu,et al. Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger , 2021, ACL.
[207] Kai-Wei Chang,et al. Societal Biases in Language Generation: Progress and Challenges , 2021, ACL.
[208] David J. Wu,et al. CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU , 2021, 2021 IEEE Symposium on Security and Privacy (SP).
[209] Frank Hopfgartner,et al. A Comparative Study of Using Pre-trained Language Models for Toxic Comment Classification , 2021, WWW.
[210] W. Dolan,et al. A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation , 2021, ACL.
[211] Andrea Madotto,et al. Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding , 2021, EMNLP.
[212] Idan Szpektor,et al. Q^{2}: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering , 2021, EMNLP.
[213] Jason Weston,et al. Retrieval Augmentation Reduces Hallucination in Conversation , 2021, EMNLP.
[214] Sylvain Lamprier,et al. QuestEval: Summarization Asks for Fact-based Evaluation , 2021, EMNLP.
[215] Jisun An,et al. A Survey on Predicting the Factuality and the Bias of News Media , 2021, ArXiv.
[216] Giovanni Da San Martino,et al. A Survey on Multimodal Disinformation Detection , 2021, COLING.
[217] Debdeep Mukhopadhyay,et al. A survey on adversarial attacks and defences , 2021, CAAI Trans. Intell. Technol..
[218] Ramesh Nallapati,et al. Entity-level Factual Consistency of Abstractive Text Summarization , 2021, EACL.
[219] Laria Reynolds,et al. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm , 2021, CHI Extended Abstracts.
[220] John Pavlopoulos,et al. Civil Rephrases Of Toxic Texts With Self-Supervised Transformers , 2021, EACL.
[221] Kai-Wei Chang,et al. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation , 2021, FAccT.
[222] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.
[223] Seid Muhie Yimam,et al. HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection , 2020, AAAI.
[224] Tom B. Brown,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.
[225] Mark O. Riedl,et al. Reducing Non-Normative Text Generation from Language Models , 2020, INLG.
[226] J. Weston,et al. Recipes for Safety in Open-domain Chatbots , 2020, ArXiv.
[227] Katja Filippova. Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data , 2020, FINDINGS.
[228] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.
[229] Sahar Abdelnabi,et al. Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding , 2020, 2021 IEEE Symposium on Security and Privacy (SP).
[230] Nick Feamster,et al. New Directions in Automated Traffic Analysis , 2020, CCS.
[231] D. Song,et al. Aligning AI With Shared Human Values , 2020, ICLR.
[232] Vivek Srikumar,et al. OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings , 2020, EMNLP.
[233] Robert Mullins,et al. Sponge Examples: Energy-Latency Attacks on Neural Networks , 2020, 2021 IEEE European Symposium on Security and Privacy (EuroS&P).
[234] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[235] Mona T. Diab,et al. FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization , 2020, ACL.
[236] Ryan McDonald,et al. On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.
[237] Diyi Yang,et al. ToTTo: A Controlled Table-To-Text Generation Dataset , 2020, EMNLP.
[238] Siva Reddy,et al. StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.
[239] Alistair E. W. Johnson,et al. Deidentification of free-text medical records using pre-trained bidirectional transformers , 2020, CHIL.
[240] Fan Yao,et al. DeepHammer: Depleting the Intelligence of Deep Neural Networks through Targeted Chain of Bit Flips , 2020, USENIX Security Symposium.
[241] Nicolas Papernot,et al. Entangled Watermarks as a Defense against Model Extraction , 2020, USENIX Security Symposium.
[242] Sudipta Chattopadhyay,et al. Towards Backdoor Attacks and Defense in Robust Machine Learning Models , 2020, Comput. Secur..
[243] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[244] Jeremy Blackburn,et al. The Pushshift Reddit Dataset , 2020, ICWSM.
[245] Sooel Son,et al. Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer , 2020, USENIX Security Symposium.
[246] Margo Seltzer,et al. UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats , 2020, NDSS.
[247] Xiangyu Zhang,et al. ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation , 2019, CCS.
[248] J. Weston,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.
[249] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[250] N. Gong,et al. MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples , 2019, CCS.
[251] David Berthelot,et al. High-Fidelity Extraction of Neural Network Models , 2019, ArXiv.
[252] Ryan Cotterell,et al. It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution , 2019, EMNLP.
[253] Alec Radford,et al. Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.
[254] Josep Domingo-Ferrer,et al. Automatic Anonymization of Textual Documents: Detecting Sensitive Information via Word Embeddings , 2019, 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE).
[255] Chin-Yew Lin,et al. A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation , 2019, ACL.
[256] Mario Fritz,et al. Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks , 2019, ICLR.
[257] Nelson F. Liu,et al. Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling , 2019, ACL.
[258] Ankur Parikh,et al. Handling Divergent Reference Texts when Evaluating Table-to-Text Generation , 2019, ACL.
[259] Ben Goodrich,et al. Assessing The Factual Accuracy of Generated Text , 2019, KDD.
[260] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.
[261] Ido Dagan,et al. Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference , 2019, ACL.
[262] Ian Molloy,et al. Defending Against Neural Network Model Stealing Attacks Using Deceptive Perturbations , 2019, 2019 IEEE Security and Privacy Workshops (SPW).
[263] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[264] Shikha Bordia,et al. Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.
[265] Ben Y. Zhao,et al. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).
[266] Ryan Cotterell,et al. Gender Bias in Contextualized Word Embeddings , 2019, NAACL.
[267] Min Suk Kang,et al. On the Feasibility of Rerouting-Based DDoS Defenses , 2019, 2019 IEEE Symposium on Security and Privacy (SP).
[268] Deliang Fan,et al. Bit-Flip Attack: Crushing Neural Network With Progressive Bit Search , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[269] Kamalika Chaudhuri,et al. Model Extraction and Active Learning , 2018, ArXiv.
[270] V. N. Venkatakrishnan,et al. HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows , 2018, 2019 IEEE Symposium on Security and Privacy (SP).
[271] Zeyu Li,et al. Learning Gender-Neutral Word Embeddings , 2018, EMNLP.
[272] Quoc V. Le,et al. A Simple Method for Commonsense Reasoning , 2018, ArXiv.
[273] Jinyuan Jia,et al. AttriGuard: A Practical Defense Against Attribute Inference Attacks via Adversarial Machine Learning , 2018, USENIX Security Symposium.
[274] Samuel Marchal,et al. PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).
[275] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.
[276] Cícero Nogueira dos Santos,et al. Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer , 2018, ACL.
[277] Vassilis P. Plagianakos,et al. Convolutional Neural Networks for Toxic Comment Classification , 2018, SETN.
[278] Yuval Elovici,et al. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection , 2018, NDSS.
[279] Reza Shokri,et al. Machine Learning with Membership Privacy using Adversarial Regularization , 2018, CCS.
[280] Ajmal Mian,et al. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.
[281] Ashwin Machanavajjhala,et al. One-sided Differential Privacy , 2017, 2020 IEEE 36th International Conference on Data Engineering (ICDE).
[282] Corina S. Pasareanu,et al. DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks , 2017, ArXiv.
[283] John Pavlopoulos,et al. Deeper Attention to Abusive User Content Moderation , 2017, EMNLP.
[284] Darko Marinov,et al. Trade-offs in continuous integration: assurance, security, and flexibility , 2017, ESEC/SIGSOFT FSE.
[285] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[286] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.
[287] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[288] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[289] Hao Chen,et al. MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.
[290] David A. Forsyth,et al. SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[291] J. H. Metzen,et al. On Detecting Adversarial Perturbations , 2017, ICLR.
[292] Mykel J. Kochenderfer,et al. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.
[293] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.
[294] A. Juels,et al. Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.
[295] Yoshua Bengio,et al. A Neural Knowledge Language Model , 2016, ArXiv.
[296] Adam Tauman Kalai,et al. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.
[297] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.
[298] Franck Dernoncourt,et al. De-identification of patient notes with recurrent neural networks , 2016, J. Am. Medical Informatics Assoc..
[299] Csaba Szepesvari,et al. Learning with a Strong Adversary , 2015, ArXiv.
[300] Somesh Jha,et al. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.
[301] David A. Wagner,et al. Control-Flow Bending: On the Effectiveness of Control-Flow Integrity , 2015, USENIX Security Symposium.
[302] Thomas Moyer,et al. Trustworthy Whole-System Provenance for the Linux Kernel , 2015, USENIX Security Symposium.
[303] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[304] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[305] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[306] Luca Rigazio,et al. Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.
[307] Xiangliang Zhang,et al. Adding Robustness to Support Vector Machines Against Adversarial Reverse Engineering , 2014, CIKM.
[308] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..
[309] Michael J. Freedman,et al. Automating Isolation and Least Privilege in Web Services , 2014, 2014 IEEE Symposium on Security and Privacy.
[310] Herbert Bos,et al. Out of Control: Overcoming Control-Flow Integrity , 2014, 2014 IEEE Symposium on Security and Privacy.
[311] David Sánchez,et al. Automatic General-Purpose Sanitization of Textual Documents , 2013, IEEE Transactions on Information Forensics and Security.
[312] Chao Zhang,et al. Practical Control Flow Integrity and Randomization for Binary Executables , 2013, 2013 IEEE Symposium on Security and Privacy.
[313] Keith Marsolo,et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction , 2013, J. Am. Medical Informatics Assoc..
[314] William K. Robertson,et al. Preventing Input Validation Vulnerabilities in Web Applications through Automated Type Analysis , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.
[315] C. Dwork. A firm foundation for private data analysis , 2011, Commun. ACM.
[316] Zunera Jalil,et al. A Review of Digital Watermarking Techniques for Text Documents , 2009, 2009 International Conference on Information and Multimedia Technology.
[317] J. Blom. A Dictionary of Hallucinations , 2009 .
[318] Cynthia Dwork,et al. Differential Privacy: A Survey of Results , 2008, TAMC.
[319] Mikhail J. Atallah,et al. The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions , 2006, MM&Sec '06.
[320] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.
[321] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[322] Mikhail J. Atallah,et al. Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation , 2001, Information Hiding.
[323] Lawrence O'Gorman,et al. Electronic marking and identification techniques to discourage document copying , 1994, Proceedings of INFOCOM '94 Conference on Computer Communications.
[324] H. B. Ritea,et al. Speech Understanding Systems , 1976, Artif. Intell..
[325] Jing Chen,et al. Poisoning Attacks in Federated Learning: A Survey , 2023, IEEE Access.
[326] Andrew M. Dai,et al. Training Socially Aligned Language Models in Simulated Human Society , 2023, ArXiv.
[327] Xiaobing Feng,et al. Honeycomb: Secure and Efficient GPU Executions via Static Validation , 2023, OSDI.
[328] Wujie Wen,et al. NeuroPots: Realtime Proactive Defense against Bit-Flip Attacks in Neural Networks , 2023, USENIX Security Symposium.
[329] Prateek Mittal,et al. Differentially Private In-Context Learning , 2023, ArXiv.
[330] Chuan Chen,et al. Towards Reliable Utilization of AIGC: Blockchain-Empowered Ownership Verification Mechanism , 2023, IEEE Open Journal of the Computer Society.
[331] Xu Tan,et al. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face , 2023, NeurIPS.
[332] M. Haghani,et al. The Risks of Using ChatGPT to Obtain Common Safety-Related Information and Advice , 2023, SSRN Electronic Journal.
[333] Terry Yue Zhuo,et al. Exploring AI Ethics of ChatGPT: A Diagnostic Analysis , 2023, ArXiv.
[334] Jie Huang,et al. Why Does ChatGPT Fall Short in Answering Questions Faithfully? , 2023, ArXiv.
[335] Junjie Fang,et al. COSYWA: Enhancing Semantic Integrity in Watermarking Natural Language Generation , 2023, NLPCC.
[336] Haomiao Yang,et al. Using Highly Compressed Gradients in Federated Learning for Data Reconstruction Attacks , 2023, IEEE Transactions on Information Forensics and Security.
[337] Murat Kantarcioglu,et al. Evading Provenance-Based ML Detectors with Adversarial System Actions , 2023, USENIX Security Symposium.
[338] Shiqing Ma,et al. The Case for Learned Provenance Graph Storage Systems , 2023, USENIX Security Symposium.
[339] Jun Huang,et al. On the Trustworthiness Landscape of State-of-the-art Generative Models: A Comprehensive Survey , 2023, ArXiv.
[340] Robert W. McGee. Is Chat Gpt Biased Against Conservatives? An Empirical Study , 2023, SSRN Electronic Journal.
[341] Zhuotao Liu,et al. An Efficient Design of Intelligent Network Data Plane , 2023, USENIX Security Symposium.
[342] Kai Zhang,et al. Adaptive Chameleon or Stubborn Sloth: Unraveling the Behavior of Large Language Models in Knowledge Clashes , 2023, ArXiv.
[343] Veselin Stoyanov,et al. Methods for Measuring, Updating, and Visualizing Factual Beliefs in Language Models , 2023, EACL.
[344] Jiacen Xu,et al. PROGRAPHER: An Anomaly Detection System based on Provenance Graph Embedding , 2023, USENIX Security Symposium.
[345] Yosr Jarraya,et al. ProvTalk: Towards Interpretable Multi-level Provenance Analysis in Networking Functions Virtualization (NFV) , 2022, NDSS.
[346] Helen M. Meng,et al. Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks , 2022, ArXiv.
[347] Bradley Reaves,et al. Characterizing the Security of Github CI Workflows , 2022, USENIX Security Symposium.
[348] V. Logacheva,et al. ParaDetox: Detoxification with Parallel Data , 2022, ACL.
[349] Shafiq R. Joty,et al. Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective , 2022, ArXiv.
[350] Taylor Berg-Kirkpatrick,et al. An Empirical Analysis of Memorization in Fine-tuned Autoregressive Language Models , 2022, EMNLP.
[351] Brayan Stiven Torrres Ovalle. GitHub Copilot , 2022, Encuentro Internacional de Educación en Ingeniería.
[352] Gábor Recski,et al. Offensive text detection across languages and datasets using rule-based and hybrid methods , 2022, CIKM Workshops.
[353] Fernando M. V. Ramos,et al. FlowLens: Enabling Efficient Flow Classification for ML-based Network Security Applications , 2021, NDSS.
[354] Jiayong Liu,et al. Exsense: Extract sensitive information from unstructured data , 2021, Comput. Secur..
[355] Abdulellah A. Alsaheel,et al. ATLAS: A Sequence-based Learning Approach for Attack Investigation , 2021, USENIX Security Symposium.
[356] Fengjun Li,et al. CONTRA: Defending Against Poisoning Attacks in Federated Learning , 2021, ESORICS.
[357] Vinod Yegneswaran,et al. ALchemist: Fusing Application and Audit Logs for Precise Attack Provenance without Instrumentation , 2021, NDSS.
[358] Gábor Recski,et al. TUW-Inf at GermEval2021: Rule-based and Hybrid Methods for Detecting Toxic, Engaging, and Fact-Claiming Comments , 2021, GERMEVAL.
[359] Isabelle Augenstein,et al. Detecting Abusive Language on Online Platforms: A Critical Analysis , 2021, ArXiv.
[360] Dit-Yan Yeung,et al. Probing Toxic Content in Large Pre-Trained Language Models , 2021, ACL.
[361] Yossi Matias,et al. Learning and Evaluating a Differentially Private Pre-trained Language Model , 2021, PRIVATENLP.
[362] Peng Li,et al. Rethinking Stealthiness of Backdoor Attack against NLP Models , 2021, ACL.
[363] W. j.,et al. Llama , 2021, Encyclopedic Dictionary of Archaeology.
[364] Michael M. Swift,et al. ATP: In-network Aggregation for Multi-tenant Learning , 2021, NSDI.
[365] Jinfeng Li,et al. TextShield: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation , 2020, USENIX Security Symposium.
[366] Xiao Yu,et al. You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis , 2020, NDSS.
[367] Yibo Zhu,et al. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters , 2020, OSDI.
[368] Martin Johns,et al. Adversarial Preprocessing: Understanding and Preventing Image-Scaling Attacks in Machine Learning , 2020, USENIX Security Symposium.
[369] Yu Chen,et al. Seeing is Not Believing: Camouflage Attacks on Image Scaling Algorithms , 2019, USENIX Security Symposium.
[370] Thomas Moyer,et al. Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs , 2018, NDSS.
[371] Aidong Zhang,et al. A Survey on Context Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.
[372] David Sands,et al. Personalised Differential Privacy Summary of POPL ’ 15 paper “ Differential Privacy : Now It ’ s Getting Personal ” , 2015 .
[373] Xiapu Luo,et al. On a New Class of Pulsing Denial-of-Service Attacks and the Defense , 2005, NDSS.
[374] Robert H. Baud,et al. Medical document anonymization with a semantic lexicon , 2000, AMIA.
[376] N. Versio. OWASP Top 10 for LLM Applications , 2022 .