TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models
暂无分享,去创建一个
[1] Lu Liu,et al. Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors , 2023, ArXiv.
[2] Pinjia He,et al. BiasAsker: Measuring the Bias in Conversational AI System , 2023, ESEC/SIGSOFT FSE.
[3] Peter J. Liu,et al. SLiC-HF: Sequence Likelihood Calibration with Human Feedback , 2023, ArXiv.
[4] T. Griffiths,et al. Tree of Thoughts: Deliberate Problem Solving with Large Language Models , 2023, NeurIPS.
[5] Yiming Yang,et al. Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision , 2023, NeurIPS.
[6] Douglas C. Schmidt,et al. Semantic Compression with Large Language Models , 2023, 2023 Tenth International Conference on Social Networks Analysis, Management and Security (SNAMS).
[7] Vishvak S. Murahari,et al. Toxicity in ChatGPT: Analyzing Persona-assigned Language Models , 2023, EMNLP.
[8] Yangqiu Song,et al. Multi-step Jailbreaking Privacy Attacks on ChatGPT , 2023, ArXiv.
[9] Zhaopeng Tu,et al. Document-Level Machine Translation with Large Language Models , 2023, ArXiv.
[10] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[11] Carlos Guestrin,et al. Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks , 2023, ArXiv.
[12] Omar Shaikh,et al. On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning , 2022, ACL.
[13] Tom B. Brown,et al. Constitutional AI: Harmlessness from AI Feedback , 2022, ArXiv.
[14] Christopher D. Manning,et al. Holistic Evaluation of Language Models , 2023, Annals of the New York Academy of Sciences.
[15] Zhaojiang Lin,et al. Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values , 2022, TRUSTNLP.
[16] Xiaoyuan Yi,et al. Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization , 2022, ICLR.
[17] P. Zhang,et al. GLM-130B: An Open Bilingual Pre-trained Model , 2022, ICLR.
[18] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.
[19] Yau-Shian Wang,et al. Toxicity Detection with Generative Prompt-based Inference , 2022, ArXiv.
[20] Eric Michael Smith,et al. “I’m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset , 2022, EMNLP.
[21] Kai-Wei Chang,et al. Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal , 2022, Findings.
[22] Dipankar Ray,et al. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection , 2022, ACL.
[23] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[24] Yitong Li,et al. Towards Identifying Social Bias in Dialog Systems: Framework, Dataset, and Benchmark , 2022, EMNLP.
[25] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[26] Toon Calders,et al. Measuring Fairness with Biased Rulers: A Survey on Quantifying Biases in Pretrained Language Models , 2021, ArXiv.
[27] Dario Amodei,et al. A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.
[28] Phu Mon Htut,et al. BBQ: A hand-built bias benchmark for question answering , 2021, FINDINGS.
[29] Ronan Le Bras,et al. CAN MACHINES LEARN MORALITY? THE DELPHI EXPERIMENT , 2021, 2110.07574.
[30] Po-Sen Huang,et al. Challenges in Detoxifying Language Models , 2021, EMNLP.
[31] Soroush Vosoughi,et al. Mitigating Political Bias in Language Models Through Reinforced Calibration , 2021, AAAI.
[32] Kai-Wei Chang,et al. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation , 2021, FAccT.
[33] Yejin Choi,et al. Social Chemistry 101: Learning to Reason about Social and Moral Norms , 2020, EMNLP.
[34] Slav Petrov,et al. Measuring and Reducing Gendered Correlations in Pre-trained Models , 2020, ArXiv.
[35] Daniel Khashabi,et al. UNQOVERing Stereotypical Biases via Underspecified Questions , 2020, FINDINGS.
[36] Samuel R. Bowman,et al. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.
[37] Yejin Choi,et al. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.
[38] Siva Reddy,et al. StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.
[39] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[40] Noah A. Smith,et al. Social Bias Frames: Reasoning about Social and Power Implications of Language , 2019, ACL.
[41] Alan W Black,et al. Measuring Bias in Contextualized Word Representations , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.
[42] Shikha Bordia,et al. Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.
[43] Chandler May,et al. On Measuring Social Biases in Sentence Encoders , 2019, NAACL.
[44] Anupam Datta,et al. Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.
[45] E. Parzen. On Estimation of a Probability Density Function and Mode , 1962 .
[46] H. B. Mann,et al. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .
[47] Terry Yue Zhuo,et al. Exploring AI Ethics of ChatGPT: A Diagnostic Analysis , 2023, ArXiv.
[48] Sahar Abdelnabi,et al. More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models , 2023, ArXiv.
[49] Yi Yang,et al. Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts , 2022, ACL.
[50] Dit-Yan Yeung,et al. Probing Toxic Content in Large Pre-Trained Language Models , 2021, ACL.
[51] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[52] Siméon-Denis Poisson. Recherches sur la probabilité des jugements en matière criminelle et en matiére civile, précédées des règles générales du calcul des probabilités , 1837 .