Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks. However, the safety and security issues of LLM systems have become the major obstacle to their widespread application. Many studies have extensively investigated risks in LLM systems and developed the corresponding mitigation strategies. Leading-edge enterprises such as OpenAI, Google, Meta, and Anthropic have also made lots of efforts on responsible LLMs. Therefore, there is a growing need to organize the existing studies and establish comprehensive taxonomies for the community. In this paper, we delve into four essential modules of an LLM system, including an input module for receiving prompts, a language model trained on extensive corpora, a toolchain module for development and deployment, and an output module for exporting LLM-generated content. Based on this, we propose a comprehensive taxonomy, which systematically analyzes potential risks associated with each module of an LLM system and discusses the corresponding mitigation strategies. Furthermore, we review prevalent benchmarks, aiming to facilitate the risk assessment of LLM systems. We hope that this paper can help LLM participants embrace a systematic perspective to build their responsible LLM systems.

[1]  Banghua Zhu,et al.  Towards Optimal Statistical Watermarking , 2023, ArXiv.

[2]  Yinpeng Dong,et al.  Evil Geniuses: Delving into the Safety of LLM-based Agents , 2023, ArXiv.

[3]  Zhangyin Feng,et al.  A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions , 2023, ACM Transactions on Information Systems.

[4]  D. Duvenaud,et al.  Towards Understanding Sycophancy in Language Models , 2023, ArXiv.

[5]  Cheongwoong Kang,et al.  Impact of Co-occurrence on Factual Knowledge of Large Language Models , 2023, EMNLP.

[6]  Zhangyin Feng,et al.  Retrieval-Generation Synergy Augmented Large Language Models , 2023, ArXiv.

[7]  Shangwei Guo,et al.  Warfare:Breaking the Watermark Protection of AI-Generated Content , 2023, 2310.07726.

[8]  Yufei Huang,et al.  Large Language Model Alignment: A Survey , 2023, ArXiv.

[9]  Trevor Darrell,et al.  Aligning Large Multimodal Models with Factually Augmented RLHF , 2023, ArXiv.

[10]  Xipeng Qiu,et al.  The Rise and Potential of Large Language Model Based Agents: A Survey , 2023, ArXiv.

[11]  James R. Glass,et al.  DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models , 2023, ArXiv.

[12]  Junyu Luo,et al.  Zero-Resource Hallucination Prevention for Large Language Models , 2023, ArXiv.

[13]  Deng Cai,et al.  Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models , 2023, ArXiv.

[14]  Wayne Xin Zhao,et al.  A Survey on Large Language Model based Autonomous Agents , 2023, Frontiers Comput. Sci..

[15]  Kai Sun,et al.  Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? A.K.A. Will LLMs Replace Knowledge Graphs? , 2023, ArXiv.

[16]  H. Niewiadomski,et al.  Graph of Thoughts: Solving Elaborate Problems with Large Language Models , 2023, AAAI.

[17]  Jimeng Sun,et al.  MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models , 2023, ArXiv.

[18]  Zhaopeng Tu,et al.  GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher , 2023, ICLR.

[19]  Jean-Francois Ton,et al.  Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment , 2023, ArXiv.

[20]  Yun Shen,et al.  You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content , 2023, ArXiv.

[21]  Quoc V. Le,et al.  Simple synthetic data reduces sycophancy in large language models , 2023, ArXiv.

[22]  M. Backes,et al.  "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models , 2023, ArXiv.

[23]  M. Backes,et al.  Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing , 2023, ArXiv.

[24]  Rodrigo Pedro,et al.  From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application? , 2023, ArXiv.

[25]  Ankit Pal,et al.  Med-HALT: Medical Domain Hallucination Test for Large Language Models , 2023, CONLL.

[26]  J. Z. Kolter,et al.  Universal and Transferable Adversarial Attacks on Aligned Language Models , 2023, ArXiv.

[27]  Jianbing Ni,et al.  Unveiling Security, Privacy, and Ethical Concerns of ChatGPT , 2023, Journal of Information and Intelligence.

[28]  Jingren Zhou,et al.  CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility , 2023, ArXiv.

[29]  Eric Michael Smith,et al.  Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.

[30]  J. Steinhardt,et al.  Overthinking the Truth: Understanding how Language Models Process False Demonstrations , 2023, ArXiv.

[31]  Neeraj Varshney,et al.  A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation , 2023, ArXiv.

[32]  J. Steinhardt,et al.  Jailbroken: How Does LLM Safety Training Fail? , 2023, NeurIPS.

[33]  Seong Joon Oh,et al.  ProPILE: Probing Privacy Leakage in Large Language Models , 2023, ArXiv.

[34]  Maanak Gupta,et al.  From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy , 2023, IEEE Access.

[35]  Houfeng Wang,et al.  Preference Ranking Optimization for Human Alignment , 2023, AAAI.

[36]  D. Song,et al.  DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models , 2023, ArXiv.

[37]  Lichao Sun,et al.  TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models , 2023, ArXiv.

[38]  Dylan Hadfield-Menell,et al.  Explore, Establish, Exploit: Red Teaming Language Models from Scratch , 2023, ArXiv.

[39]  Linbo Qiao,et al.  Protecting User Privacy in Remote Conversational Systems: A Privacy-Preserving framework based on text sanitization , 2023, ArXiv.

[40]  Tianwei Zhang,et al.  Prompt Injection attack against LLM-integrated Applications , 2023, ArXiv.

[41]  N. Gong,et al.  PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts , 2023, ArXiv.

[42]  Maosong Sun,et al.  Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations , 2023, NeurIPS.

[43]  Louis-Philippe Morency,et al.  Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions , 2023, ACL.

[44]  Sameer Singh,et al.  MISGENDERED: Limits of Large Language Models in Understanding Pronouns , 2023, ACL.

[45]  Thomas Lukasiewicz,et al.  An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models , 2023, ACL.

[46]  M. Wattenberg,et al.  Inference-Time Intervention: Eliciting Truthful Answers from a Language Model , 2023, NeurIPS.

[47]  Lukas Pfahler,et al.  Exposing Bias in Online Communities through Large-Scale Language Models , 2023, ArXiv.

[48]  Yang Xu,et al.  Knowledge of cultural moral norms in large language models , 2023, ACL.

[49]  N. Imran,et al.  Chat-GPT: Opportunities and Challenges in Child Mental Healthcare , 2023, Pakistan journal of medical sciences.

[50]  Julien Launay,et al.  The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only , 2023, ArXiv.

[51]  Lester W. Mackey,et al.  Do Language Models Know When They’re Hallucinating References? , 2023, FINDINGS.

[52]  Taylor Berg-Kirkpatrick,et al.  Membership Inference Attacks against Language Models via Neighbourhood Comparison , 2023, ACL.

[53]  Christopher D. Manning,et al.  Direct Preference Optimization: Your Language Model is Secretly a Reward Model , 2023, NeurIPS.

[54]  T. Luan,et al.  A Survey on ChatGPT: AI–Generated Contents, Challenges, and Solutions , 2023, IEEE Open Journal of the Computer Society.

[55]  M. Shanahan,et al.  Role play with large language models , 2023, Nature.

[56]  Minlie Huang,et al.  Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy , 2023, EMNLP.

[57]  Luke Zettlemoyer,et al.  Trusting Your Evidence: Hallucinate Less with Context-aware Decoding , 2023, NAACL.

[58]  P. Charan,et al.  From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads , 2023, ArXiv.

[59]  Kelvin Guu,et al.  PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions , 2023, ArXiv.

[60]  Nicolas Papernot,et al.  Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models , 2023, NeurIPS.

[61]  Danqi Chen,et al.  Enabling Large Language Models to Generate Text with Citations , 2023, EMNLP.

[62]  Zhengzi Xu,et al.  Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study , 2023, ArXiv.

[63]  Pang Wei Koh,et al.  FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation , 2023, EMNLP.

[64]  Yilun Du,et al.  Improving Factuality and Reasoning in Language Models through Multiagent Debate , 2023, ArXiv.

[65]  Elena Sofia Ruzzetti,et al.  A Trip Towards Fairness: Bias and De-Biasing in Large Language Models , 2023, ArXiv.

[66]  Jonas Pfeiffer,et al.  mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations , 2023, EMNLP.

[67]  Ashish Sabharwal,et al.  Improving Language Models via Plug-and-Play Retrieval Feedback , 2023, ArXiv.

[68]  Shafiq R. Joty,et al.  LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond , 2023, ArXiv.

[69]  William Yang Wang,et al.  Mitigating Language Model Hallucination with Interactive Question-Knowledge Alignment , 2023, ArXiv.

[70]  Mohammad Javad Hosseini,et al.  Sources of Hallucination by Large Language Models on Inference Tasks , 2023, EMNLP.

[71]  K. Chang,et al.  Quantifying Association Capabilities of Large Language Models and Its Implications on Privacy Leakage , 2023, ArXiv.

[72]  A. Globerson,et al.  LM vs LM: Detecting Factual Errors via Cross Examination , 2023, EMNLP.

[73]  Noah A. Smith,et al.  How Language Model Hallucinations Can Snowball , 2023, ArXiv.

[74]  Dennis Aumiller,et al.  Evaluating Factual Consistency of Texts with Semantic Role Labeling , 2023, STARSEM.

[75]  Animesh Mukherjee,et al.  Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection , 2023, ArXiv.

[76]  Pinjia He,et al.  BiasAsker: Measuring the Bias in Conversational AI System , 2023, ESEC/SIGSOFT FSE.

[77]  Mustafa A. Mustafa,et al.  A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation , 2023, Artif. Intell. Rev..

[78]  Greg Durrett,et al.  Complex Claim Verification with Evidence Retrieved in the Wild , 2023, ArXiv.

[79]  Wayne Xin Zhao,et al.  HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models , 2023, EMNLP.

[80]  Weizhu Chen,et al.  CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing , 2023, ICLR.

[81]  Omer Levy,et al.  LIMA: Less Is More for Alignment , 2023, NeurIPS.

[82]  Andrew M. Dai,et al.  PaLM 2 Technical Report , 2023, ArXiv.

[83]  T. Griffiths,et al.  Tree of Thoughts: Deliberate Problem Solving with Large Language Models , 2023, NeurIPS.

[84]  Nghi D. Q. Bui,et al.  CodeT5+: Open Code Large Language Models for Code Understanding and Generation , 2023, EMNLP.

[85]  Hou Pong Chan,et al.  Zero-shot Faithful Factual Error Correction , 2023, ACL.

[86]  Xiangnan He,et al.  Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation , 2023, RecSys.

[87]  Huan Sun,et al.  Automatic Evaluation of Attribution by Large Language Models , 2023, EMNLP.

[88]  Zhixing Tan,et al.  Privacy-Preserving Prompt Tuning for Large Language Model Services , 2023, ArXiv.

[89]  Shafiq R. Joty,et al.  Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework , 2023, ACL.

[90]  J. Zhao,et al.  Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models , 2023, EMNLP.

[91]  Xueluan Gong,et al.  D-DAE: Defense-Penetrating Model Extraction Attacks , 2023, 2023 IEEE Symposium on Security and Privacy (SP).

[92]  Ravi Theja Gollapudi,et al.  Control Flow and Pointer Integrity Enforcement in a Secure Tagged Architecture , 2023, 2023 IEEE Symposium on Security and Privacy (SP).

[93]  Wajih Ul Hassan,et al.  SoK: History is a Vast Early Warning System: Auditing the Provenance of System Intrusions , 2023, 2023 IEEE Symposium on Security and Privacy (SP).

[94]  Chaowei Xiao,et al.  ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger , 2023, ArXiv.

[95]  Tom M. Mitchell,et al.  The Internal State of an LLM Knows When its Lying , 2023, EMNLP.

[96]  Haoming Jiang,et al.  Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond , 2023, ACM Trans. Knowl. Discov. Data.

[97]  Hao Sun,et al.  Safety Assessment of Chinese Large Language Models , 2023, ArXiv.

[98]  A. Shashua,et al.  Fundamental Limitations of Alignment in Large Language Models , 2023, ICML.

[99]  Markus Pauly,et al.  The Self-Perception and Political Biases of ChatGPT , 2023, Human Behavior and Emerging Technologies.

[100]  Sunder Ali Khowaja,et al.  ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A Review , 2023, Cognitive Computation.

[101]  Vishvak S. Murahari,et al.  Toxicity in ChatGPT: Analyzing Persona-assigned Language Models , 2023, EMNLP.

[102]  Yangqiu Song,et al.  Multi-step Jailbreaking Privacy Attacks on ChatGPT , 2023, EMNLP.

[103]  Songfang Huang,et al.  RRHF: Rank Responses to Align Language Models with Human Feedback without tears , 2023, NeurIPS.

[104]  Yuqing Wang,et al.  Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding , 2023, ArXiv.

[105]  Emilio Ferrara Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models , 2023, First Monday.

[106]  Wayne Xin Zhao,et al.  A Survey of Large Language Models , 2023, ArXiv.

[107]  Daniel Hershcovich,et al.  Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study , 2023, C3NLP.

[108]  Michael J. Puett,et al.  A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Wikipedia, and YouTube , 2023, ArXiv.

[109]  Wensheng Gan,et al.  AI-Generated Content (AIGC): A Survey , 2023, ArXiv.

[110]  Vinu Sankar Sadasivan,et al.  Can AI-Generated Text be Reliably Detected? , 2023, ArXiv.

[111]  Bibhu Dash,et al.  Impact of Big Data Analytics and ChatGPT on Cybersecurity , 2023, 2023 4th International Conference on Computing and Communication Systems (I3CS).

[112]  M. Gales,et al.  SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models , 2023, EMNLP.

[113]  Ari S. Morcos,et al.  SemDeDup: Data-efficient learning at web-scale through semantic deduplication , 2023, ArXiv.

[114]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[115]  Stella Rose Biderman,et al.  Eliciting Latent Predictions from Transformers with the Tuned Lens , 2023, ArXiv.

[116]  David Ifeoluwa Adelani,et al.  The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset , 2023, NeurIPS.

[117]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[118]  Tianwei Zhang,et al.  Aegis: Mitigating Targeted Bit-flip Attacks against Deep Neural Networks , 2023, USENIX Security Symposium.

[119]  Michel Galley,et al.  Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback , 2023, ArXiv.

[120]  Sahar Abdelnabi,et al.  Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection , 2023, AISec@CCS.

[121]  Jindong Wang,et al.  On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective , 2023, IEEE Data Eng. Bull..

[122]  Lichao Sun,et al.  BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT , 2023, ArXiv.

[123]  Haewoon Kwak,et al.  Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech , 2023, WWW.

[124]  Carlos Guestrin,et al.  Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks , 2023, ArXiv.

[125]  Dan Su,et al.  A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity , 2023, IJCNLP.

[126]  P. Abbeel,et al.  Chain of Hindsight Aligns Language Models with Feedback , 2023, ArXiv.

[127]  Yu-Neng Chuang,et al.  The Science of Detecting LLM-Generated Texts , 2023, ArXiv.

[128]  Samy I McFarlane,et al.  Artificial Hallucinations in ChatGPT: Implications in Scientific Writing , 2023, Cureus.

[129]  Shruti Tople,et al.  Analyzing Leakage of Personally Identifiable Information in Language Models , 2023, 2023 IEEE Symposium on Security and Privacy (SP).

[130]  Y. Shoham,et al.  In-Context Retrieval-Augmented Language Models , 2023, Transactions of the Association for Computational Linguistics.

[131]  Ke Xu,et al.  Detecting Unknown Encrypted Malicious Traffic in Real Time via Flow Interaction Graph Analysis , 2023, NDSS.

[132]  Jonathan Katz,et al.  A Watermark for Large Language Models , 2023, ICML.

[133]  Jochen Hartmann,et al.  The political ideology of conversational AI: Converging evidence on ChatGPT's pro-environmental, left-libertarian orientation , 2023, SSRN Electronic Journal.

[134]  Soroush Vosoughi,et al.  Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits , 2023, NeurIPS.

[135]  R. Das,et al.  When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories , 2022, ACL.

[136]  Omar Shaikh,et al.  On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning , 2022, ACL.

[137]  Tom B. Brown,et al.  Constitutional AI: Harmlessness from AI Feedback , 2022, ArXiv.

[138]  F'abio Perez,et al.  Ignore Previous Prompt: Attack Techniques For Language Models , 2022, ArXiv.

[139]  Guillem Cucurull,et al.  Galactica: A Large Language Model for Science , 2022, ArXiv.

[140]  Colin Raffel,et al.  Large Language Models Struggle to Learn Long-Tail Knowledge , 2022, ICML.

[141]  Colin Raffel,et al.  Evaluating the Factual Consistency of Large Language Models Through News Summarization , 2022, ACL.

[142]  Jindong Wang,et al.  GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective , 2022, ACL.

[143]  M. Zaheer,et al.  Large Language Models with Controllable Working Memory , 2022, ACL.

[144]  Jiannong Cao,et al.  StrongBox: A GPU TEE on Arm Endpoints , 2022, CCS.

[145]  Xiang Lisa Li,et al.  Contrastive Decoding: Open-ended Text Generation as Optimization , 2022, ACL.

[146]  Arun Tejasvi Chaganty,et al.  RARR: Researching and Revising What Language Models Say, Using Language Models , 2022, ACL.

[147]  N. Japkowicz,et al.  Machine-Generated Text: A Comprehensive Survey of Threat Models and Detection Methods , 2022, IEEE Access.

[148]  Noah A. Smith,et al.  Measuring and Narrowing the Compositionality Gap in Language Models , 2022, EMNLP.

[149]  I. Shafran,et al.  ReAct: Synergizing Reasoning and Acting in Language Models , 2022, ICLR.

[150]  P. Zhang,et al.  GLM-130B: An Open Bilingual Pre-trained Model , 2022, ICLR.

[151]  Peter J. Liu,et al.  Calibrating Sequence likelihood Improves Conditional Language Generation , 2022, ICLR.

[152]  G. Karypis,et al.  Differentially Private Bias-Term only Fine-tuning of Foundation Models , 2022, ArXiv.

[153]  Lisa Anne Hendricks,et al.  Improving alignment of dialogue agents via targeted human judgements , 2022, ArXiv.

[154]  Tom B. Brown,et al.  In-context Learning and Induction Heads , 2022, ArXiv.

[155]  O. Hohlfeld,et al.  IXP scrubber: learning from blackholing traffic for ML-driven DDoS detection at scale , 2022, SIGCOMM.

[156]  Yoav Goldberg,et al.  Measuring Causal Effects of Data Statistics on Language Model's 'Factual' Predictions , 2022, ArXiv.

[157]  Mario Fritz,et al.  RelaxLoss: Defending Membership Inference Attacks without Losing Utility , 2022, ICLR.

[158]  Florian Tramèr,et al.  Measuring Forgetting of Memorized Training Examples , 2022, ICLR.

[159]  M. Shoeybi,et al.  Factuality Enhanced Language Models for Open-Ended Text Generation , 2022, NeurIPS.

[160]  R. Zemel,et al.  Differentially Private Decoding in Large Language Models , 2022, ArXiv.

[161]  David Evans,et al.  Memorization in NLP Fine-tuning Methods , 2022, ArXiv.

[162]  K. Chang,et al.  Are Large Pre-Trained Language Models Leaking Your Personal Information? , 2022, EMNLP.

[163]  Yau-Shian Wang,et al.  Toxicity Detection with Generative Prompt-based Inference , 2022, ArXiv.

[164]  Tom B. Brown,et al.  Scaling Laws and Interpretability of Learning from Repeated Data , 2022, ArXiv.

[165]  Eric Michael Smith,et al.  “I’m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset , 2022, EMNLP.

[166]  Issa M. Khalil,et al.  SIRAJ: A Unified Framework for Aggregation of Malicious Entity Detectors , 2022, 2022 IEEE Symposium on Security and Privacy (SP).

[167]  X. Koutsoukos,et al.  Graphics Peeping Unit: Exploiting EM Side-Channel Information of GPUs to Eavesdrop on Your Neighbors , 2022, 2022 IEEE Symposium on Security and Privacy (SP).

[168]  K. Shin,et al.  SpecHammer: Combining Spectre and Rowhammer for New Speculative Attacks , 2022, 2022 IEEE Symposium on Security and Privacy (SP).

[169]  R. Jia,et al.  Just Fine-tune Twice: Selective Differential Privacy for Large Language Models , 2022, EMNLP.

[170]  Stella Rose Biderman,et al.  GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.

[171]  Tom B. Brown,et al.  Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.

[172]  M. Nagappan,et al.  Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code? , 2022, Empirical Software Engineering.

[173]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[174]  Chengjie Sun,et al.  How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis , 2022, FINDINGS.

[175]  S. Savarese,et al.  CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis , 2022, ICLR.

[176]  D. Schuurmans,et al.  Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.

[177]  Dipankar Ray,et al.  ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection , 2022, ACL.

[178]  Jooyoung Lee,et al.  Do Language Models Plagiarize? , 2022, WWW.

[179]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[180]  Florian Tramèr,et al.  Quantifying Memorization Across Neural Language Models , 2022, ICLR.

[181]  Colin Raffel,et al.  Deduplicating Training Data Mitigates Privacy Risks in Language Models , 2022, ICML.

[182]  Florian Tramèr,et al.  What Does it Mean for a Language Model to Preserve Privacy? , 2022, FAccT.

[183]  Pascale Fung,et al.  Survey of Hallucination in Natural Language Generation , 2022, ACM Comput. Surv..

[184]  M. Shoeybi,et al.  Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models , 2022, NeurIPS.

[185]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[186]  Lingming Zhang,et al.  Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).

[187]  Fei Mi,et al.  COLD: A Benchmark for Chinese Offensive Language Detection , 2022, EMNLP.

[188]  Laurens van der Maaten,et al.  Submix: Practical Private Prediction for Large-Scale Language Models , 2022, ArXiv.

[189]  Jeff Wu,et al.  WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[190]  Alexander R. Fabbri,et al.  QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization , 2021, NAACL.

[191]  Yang Zhang,et al.  Model Stealing Attacks Against Inductive Graph Neural Networks , 2021, 2022 IEEE Symposium on Security and Privacy (SP).

[192]  Po-Sen Huang,et al.  Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.

[193]  Laxmi N. Bhuyan,et al.  SmartWatch: accurate traffic analysis and flow-state tracking for intrusion prevention using SmartNICs , 2021, CoNEXT.

[194]  Yufei Chen,et al.  Property Inference Attacks Against GANs , 2021, NDSS.

[195]  Anja Feldmann,et al.  United We Stand: Collaborative Detection and Mitigation of Amplification DDoS Attacks at Scale , 2021, CCS.

[196]  Zhe Gan,et al.  Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models , 2021, NeurIPS Datasets and Benchmarks.

[197]  Huseyin A. Inan,et al.  Differentially Private Fine-tuning of Language Models , 2021, ICLR.

[198]  Eitan Grinspun,et al.  Can one hear the shape of a neural network?: Snooping the GPU via Magnetic Side Channel , 2021, USENIX Security Symposium.

[199]  Po-Sen Huang,et al.  Challenges in Detoxifying Language Models , 2021, EMNLP.

[200]  Owain Evans,et al.  TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.

[201]  R. Jia,et al.  Selective Differential Privacy for Language Modeling , 2021, NAACL.

[202]  Andreas Vlachos,et al.  A Survey on Automated Fact-Checking , 2021, TACL.

[203]  Nicholas Carlini,et al.  Deduplicating Training Data Makes Language Models Better , 2021, ACL.

[204]  Qi Li,et al.  Realtime Robust Malicious Traffic Detection via Frequency Domain Analysis , 2021, CCS.

[205]  Christy Dennison,et al.  Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets , 2021, NeurIPS.

[206]  Zhiyuan Liu,et al.  Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger , 2021, ACL.

[207]  Kai-Wei Chang,et al.  Societal Biases in Language Generation: Progress and Challenges , 2021, ACL.

[208]  David J. Wu,et al.  CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU , 2021, 2021 IEEE Symposium on Security and Privacy (SP).

[209]  Frank Hopfgartner,et al.  A Comparative Study of Using Pre-trained Language Models for Toxic Comment Classification , 2021, WWW.

[210]  W. Dolan,et al.  A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation , 2021, ACL.

[211]  Andrea Madotto,et al.  Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding , 2021, EMNLP.

[212]  Idan Szpektor,et al.  Q^{2}: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering , 2021, EMNLP.

[213]  Jason Weston,et al.  Retrieval Augmentation Reduces Hallucination in Conversation , 2021, EMNLP.

[214]  Sylvain Lamprier,et al.  QuestEval: Summarization Asks for Fact-based Evaluation , 2021, EMNLP.

[215]  Jisun An,et al.  A Survey on Predicting the Factuality and the Bias of News Media , 2021, ArXiv.

[216]  Giovanni Da San Martino,et al.  A Survey on Multimodal Disinformation Detection , 2021, COLING.

[217]  Debdeep Mukhopadhyay,et al.  A survey on adversarial attacks and defences , 2021, CAAI Trans. Intell. Technol..

[218]  Ramesh Nallapati,et al.  Entity-level Factual Consistency of Abstractive Text Summarization , 2021, EACL.

[219]  Laria Reynolds,et al.  Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm , 2021, CHI Extended Abstracts.

[220]  John Pavlopoulos,et al.  Civil Rephrases Of Toxic Texts With Self-Supervised Transformers , 2021, EACL.

[221]  Kai-Wei Chang,et al.  BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation , 2021, FAccT.

[222]  Charles Foster,et al.  The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[223]  Seid Muhie Yimam,et al.  HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection , 2020, AAAI.

[224]  Tom B. Brown,et al.  Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.

[225]  Mark O. Riedl,et al.  Reducing Non-Normative Text Generation from Language Models , 2020, INLG.

[226]  J. Weston,et al.  Recipes for Safety in Open-domain Chatbots , 2020, ArXiv.

[227]  Katja Filippova Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data , 2020, FINDINGS.

[228]  Yejin Choi,et al.  RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.

[229]  Sahar Abdelnabi,et al.  Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding , 2020, 2021 IEEE Symposium on Security and Privacy (SP).

[230]  Nick Feamster,et al.  New Directions in Automated Traffic Analysis , 2020, CCS.

[231]  D. Song,et al.  Aligning AI With Shared Human Values , 2020, ICLR.

[232]  Vivek Srikumar,et al.  OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings , 2020, EMNLP.

[233]  Robert Mullins,et al.  Sponge Examples: Energy-Latency Attacks on Neural Networks , 2020, 2021 IEEE European Symposium on Security and Privacy (EuroS&P).

[234]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[235]  Mona T. Diab,et al.  FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization , 2020, ACL.

[236]  Ryan McDonald,et al.  On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.

[237]  Diyi Yang,et al.  ToTTo: A Controlled Table-To-Text Generation Dataset , 2020, EMNLP.

[238]  Siva Reddy,et al.  StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.

[239]  Alistair E. W. Johnson,et al.  Deidentification of free-text medical records using pre-trained bidirectional transformers , 2020, CHIL.

[240]  Fan Yao,et al.  DeepHammer: Depleting the Intelligence of Deep Neural Networks through Targeted Chain of Bit Flips , 2020, USENIX Security Symposium.

[241]  Nicolas Papernot,et al.  Entangled Watermarks as a Defense against Model Extraction , 2020, USENIX Security Symposium.

[242]  Sudipta Chattopadhyay,et al.  Towards Backdoor Attacks and Defense in Robust Machine Learning Models , 2020, Comput. Secur..

[243]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[244]  Jeremy Blackburn,et al.  The Pushshift Reddit Dataset , 2020, ICWSM.

[245]  Sooel Son,et al.  Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer , 2020, USENIX Security Symposium.

[246]  Margo Seltzer,et al.  UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats , 2020, NDSS.

[247]  Xiangyu Zhang,et al.  ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation , 2019, CCS.

[248]  J. Weston,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.

[249]  Yibo Zhu,et al.  A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.

[250]  N. Gong,et al.  MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples , 2019, CCS.

[251]  David Berthelot,et al.  High-Fidelity Extraction of Neural Network Models , 2019, ArXiv.

[252]  Ryan Cotterell,et al.  It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution , 2019, EMNLP.

[253]  Alec Radford,et al.  Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.

[254]  Josep Domingo-Ferrer,et al.  Automatic Anonymization of Textual Documents: Detecting Sensitive Information via Word Embeddings , 2019, 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE).

[255]  Chin-Yew Lin,et al.  A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation , 2019, ACL.

[256]  Mario Fritz,et al.  Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks , 2019, ICLR.

[257]  Nelson F. Liu,et al.  Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling , 2019, ACL.

[258]  Ankur Parikh,et al.  Handling Divergent Reference Texts when Evaluating Table-to-Text Generation , 2019, ACL.

[259]  Ben Goodrich,et al.  Assessing The Factual Accuracy of Generated Text , 2019, KDD.

[260]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[261]  Ido Dagan,et al.  Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference , 2019, ACL.

[262]  Ian Molloy,et al.  Defending Against Neural Network Model Stealing Attacks Using Deceptive Perturbations , 2019, 2019 IEEE Security and Privacy Workshops (SPW).

[263]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[264]  Shikha Bordia,et al.  Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.

[265]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[266]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[267]  Min Suk Kang,et al.  On the Feasibility of Rerouting-Based DDoS Defenses , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[268]  Deliang Fan,et al.  Bit-Flip Attack: Crushing Neural Network With Progressive Bit Search , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[269]  Kamalika Chaudhuri,et al.  Model Extraction and Active Learning , 2018, ArXiv.

[270]  V. N. Venkatakrishnan,et al.  HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[271]  Zeyu Li,et al.  Learning Gender-Neutral Word Embeddings , 2018, EMNLP.

[272]  Quoc V. Le,et al.  A Simple Method for Commonsense Reasoning , 2018, ArXiv.

[273]  Jinyuan Jia,et al.  AttriGuard: A Practical Defense Against Attribute Inference Attacks via Adversarial Machine Learning , 2018, USENIX Security Symposium.

[274]  Samuel Marchal,et al.  PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[275]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[276]  Cícero Nogueira dos Santos,et al.  Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer , 2018, ACL.

[277]  Vassilis P. Plagianakos,et al.  Convolutional Neural Networks for Toxic Comment Classification , 2018, SETN.

[278]  Yuval Elovici,et al.  Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection , 2018, NDSS.

[279]  Reza Shokri,et al.  Machine Learning with Membership Privacy using Adversarial Regularization , 2018, CCS.

[280]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[281]  Ashwin Machanavajjhala,et al.  One-sided Differential Privacy , 2017, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[282]  Corina S. Pasareanu,et al.  DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks , 2017, ArXiv.

[283]  John Pavlopoulos,et al.  Deeper Attention to Abusive User Content Moderation , 2017, EMNLP.

[284]  Darko Marinov,et al.  Trade-offs in continuous integration: assurance, security, and flexibility , 2017, ESEC/SIGSOFT FSE.

[285]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[286]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[287]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[288]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[289]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[290]  David A. Forsyth,et al.  SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[291]  J. H. Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[292]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[293]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[294]  A. Juels,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[295]  Yoshua Bengio,et al.  A Neural Knowledge Language Model , 2016, ArXiv.

[296]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[297]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[298]  Franck Dernoncourt,et al.  De-identification of patient notes with recurrent neural networks , 2016, J. Am. Medical Informatics Assoc..

[299]  Csaba Szepesvari,et al.  Learning with a Strong Adversary , 2015, ArXiv.

[300]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[301]  David A. Wagner,et al.  Control-Flow Bending: On the Effectiveness of Control-Flow Integrity , 2015, USENIX Security Symposium.

[302]  Thomas Moyer,et al.  Trustworthy Whole-System Provenance for the Linux Kernel , 2015, USENIX Security Symposium.

[303]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[304]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[305]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[306]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[307]  Xiangliang Zhang,et al.  Adding Robustness to Support Vector Machines Against Adversarial Reverse Engineering , 2014, CIKM.

[308]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[309]  Michael J. Freedman,et al.  Automating Isolation and Least Privilege in Web Services , 2014, 2014 IEEE Symposium on Security and Privacy.

[310]  Herbert Bos,et al.  Out of Control: Overcoming Control-Flow Integrity , 2014, 2014 IEEE Symposium on Security and Privacy.

[311]  David Sánchez,et al.  Automatic General-Purpose Sanitization of Textual Documents , 2013, IEEE Transactions on Information Forensics and Security.

[312]  Chao Zhang,et al.  Practical Control Flow Integrity and Randomization for Binary Executables , 2013, 2013 IEEE Symposium on Security and Privacy.

[313]  Keith Marsolo,et al.  Large-scale evaluation of automated clinical note de-identification and its impact on information extraction , 2013, J. Am. Medical Informatics Assoc..

[314]  William K. Robertson,et al.  Preventing Input Validation Vulnerabilities in Web Applications through Automated Type Analysis , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.

[315]  C. Dwork A firm foundation for private data analysis , 2011, Commun. ACM.

[316]  Zunera Jalil,et al.  A Review of Digital Watermarking Techniques for Text Documents , 2009, 2009 International Conference on Information and Multimedia Technology.

[317]  J. Blom A Dictionary of Hallucinations , 2009 .

[318]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[319]  Mikhail J. Atallah,et al.  The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions , 2006, MM&Sec '06.

[320]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[321]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[322]  Mikhail J. Atallah,et al.  Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation , 2001, Information Hiding.

[323]  Lawrence O'Gorman,et al.  Electronic marking and identification techniques to discourage document copying , 1994, Proceedings of INFOCOM '94 Conference on Computer Communications.

[324]  H. B. Ritea,et al.  Speech Understanding Systems , 1976, Artif. Intell..

[325]  Jing Chen,et al.  Poisoning Attacks in Federated Learning: A Survey , 2023, IEEE Access.

[326]  Andrew M. Dai,et al.  Training Socially Aligned Language Models in Simulated Human Society , 2023, ArXiv.

[327]  Xiaobing Feng,et al.  Honeycomb: Secure and Efficient GPU Executions via Static Validation , 2023, OSDI.

[328]  Wujie Wen,et al.  NeuroPots: Realtime Proactive Defense against Bit-Flip Attacks in Neural Networks , 2023, USENIX Security Symposium.

[329]  Prateek Mittal,et al.  Differentially Private In-Context Learning , 2023, ArXiv.

[330]  Chuan Chen,et al.  Towards Reliable Utilization of AIGC: Blockchain-Empowered Ownership Verification Mechanism , 2023, IEEE Open Journal of the Computer Society.

[331]  Xu Tan,et al.  HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face , 2023, NeurIPS.

[332]  M. Haghani,et al.  The Risks of Using ChatGPT to Obtain Common Safety-Related Information and Advice , 2023, SSRN Electronic Journal.

[333]  Terry Yue Zhuo,et al.  Exploring AI Ethics of ChatGPT: A Diagnostic Analysis , 2023, ArXiv.

[334]  Jie Huang,et al.  Why Does ChatGPT Fall Short in Answering Questions Faithfully? , 2023, ArXiv.

[335]  Junjie Fang,et al.  COSYWA: Enhancing Semantic Integrity in Watermarking Natural Language Generation , 2023, NLPCC.

[336]  Haomiao Yang,et al.  Using Highly Compressed Gradients in Federated Learning for Data Reconstruction Attacks , 2023, IEEE Transactions on Information Forensics and Security.

[337]  Murat Kantarcioglu,et al.  Evading Provenance-Based ML Detectors with Adversarial System Actions , 2023, USENIX Security Symposium.

[338]  Shiqing Ma,et al.  The Case for Learned Provenance Graph Storage Systems , 2023, USENIX Security Symposium.

[339]  Jun Huang,et al.  On the Trustworthiness Landscape of State-of-the-art Generative Models: A Comprehensive Survey , 2023, ArXiv.

[340]  Robert W. McGee Is Chat Gpt Biased Against Conservatives? An Empirical Study , 2023, SSRN Electronic Journal.

[341]  Zhuotao Liu,et al.  An Efficient Design of Intelligent Network Data Plane , 2023, USENIX Security Symposium.

[342]  Kai Zhang,et al.  Adaptive Chameleon or Stubborn Sloth: Unraveling the Behavior of Large Language Models in Knowledge Clashes , 2023, ArXiv.

[343]  Veselin Stoyanov,et al.  Methods for Measuring, Updating, and Visualizing Factual Beliefs in Language Models , 2023, EACL.

[344]  Jiacen Xu,et al.  PROGRAPHER: An Anomaly Detection System based on Provenance Graph Embedding , 2023, USENIX Security Symposium.

[345]  Yosr Jarraya,et al.  ProvTalk: Towards Interpretable Multi-level Provenance Analysis in Networking Functions Virtualization (NFV) , 2022, NDSS.

[346]  Helen M. Meng,et al.  Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks , 2022, ArXiv.

[347]  Bradley Reaves,et al.  Characterizing the Security of Github CI Workflows , 2022, USENIX Security Symposium.

[348]  V. Logacheva,et al.  ParaDetox: Detoxification with Parallel Data , 2022, ACL.

[349]  Shafiq R. Joty,et al.  Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective , 2022, ArXiv.

[350]  Taylor Berg-Kirkpatrick,et al.  An Empirical Analysis of Memorization in Fine-tuned Autoregressive Language Models , 2022, EMNLP.

[351]  Brayan Stiven Torrres Ovalle GitHub Copilot , 2022, Encuentro Internacional de Educación en Ingeniería.

[352]  Gábor Recski,et al.  Offensive text detection across languages and datasets using rule-based and hybrid methods , 2022, CIKM Workshops.

[353]  Fernando M. V. Ramos,et al.  FlowLens: Enabling Efficient Flow Classification for ML-based Network Security Applications , 2021, NDSS.

[354]  Jiayong Liu,et al.  Exsense: Extract sensitive information from unstructured data , 2021, Comput. Secur..

[355]  Abdulellah A. Alsaheel,et al.  ATLAS: A Sequence-based Learning Approach for Attack Investigation , 2021, USENIX Security Symposium.

[356]  Fengjun Li,et al.  CONTRA: Defending Against Poisoning Attacks in Federated Learning , 2021, ESORICS.

[357]  Vinod Yegneswaran,et al.  ALchemist: Fusing Application and Audit Logs for Precise Attack Provenance without Instrumentation , 2021, NDSS.

[358]  Gábor Recski,et al.  TUW-Inf at GermEval2021: Rule-based and Hybrid Methods for Detecting Toxic, Engaging, and Fact-Claiming Comments , 2021, GERMEVAL.

[359]  Isabelle Augenstein,et al.  Detecting Abusive Language on Online Platforms: A Critical Analysis , 2021, ArXiv.

[360]  Dit-Yan Yeung,et al.  Probing Toxic Content in Large Pre-Trained Language Models , 2021, ACL.

[361]  Yossi Matias,et al.  Learning and Evaluating a Differentially Private Pre-trained Language Model , 2021, PRIVATENLP.

[362]  Peng Li,et al.  Rethinking Stealthiness of Backdoor Attack against NLP Models , 2021, ACL.

[363]  W. j.,et al.  Llama , 2021, Encyclopedic Dictionary of Archaeology.

[364]  Michael M. Swift,et al.  ATP: In-network Aggregation for Multi-tenant Learning , 2021, NSDI.

[365]  Jinfeng Li,et al.  TextShield: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation , 2020, USENIX Security Symposium.

[366]  Xiao Yu,et al.  You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis , 2020, NDSS.

[367]  Yibo Zhu,et al.  A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters , 2020, OSDI.

[368]  Martin Johns,et al.  Adversarial Preprocessing: Understanding and Preventing Image-Scaling Attacks in Machine Learning , 2020, USENIX Security Symposium.

[369]  Yu Chen,et al.  Seeing is Not Believing: Camouflage Attacks on Image Scaling Algorithms , 2019, USENIX Security Symposium.

[370]  Thomas Moyer,et al.  Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs , 2018, NDSS.

[371]  Aidong Zhang,et al.  A Survey on Context Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[372]  David Sands,et al.  Personalised Differential Privacy Summary of POPL ’ 15 paper “ Differential Privacy : Now It ’ s Getting Personal ” , 2015 .

[373]  Xiapu Luo,et al.  On a New Class of Pulsing Denial-of-Service Attacks and the Defense , 2005, NDSS.

[374]  Robert H. Baud,et al.  Medical document anonymization with a semantic lexicon , 2000, AMIA.

[375]  This paper is included in the Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation. Re-architecting Traffic Analysis with Neural Network Interface Cards , 2022 .

[376]  N. Versio OWASP Top 10 for LLM Applications , 2022 .