Improving alignment of dialogue agents via targeted human judgements
暂无分享,去创建一个
Lisa Anne Hendricks | William S. Isaac | John F. J. Mellor | A. See | Geoffrey Irving | Timo Ewalds | K. Kavukcuoglu | D. Hassabis | Vlad Firoiu | Po-Sen Huang | J. Aslanides | Fan Yang | Sumanth Dathathri | Boxi Wu | Doug Fritz | Susannah Young | Laura Weidinger | Iason Gabriel | J. Uesato | M. Rauh | A. Glaese | Charlie Chen | Lucy Campbell-Gillingham | Nathan McAleese | Maja Trkebacz | Martin Chadwick | Phoebe Thacker | R. Comanescu | Rory Greig | Jaume Sanchez Elias | Richard Green | Sovna Mokr'a | Nicholas Fernando | Rachel Foley | G. Irving | Ramona Comanescu | John Aslanides
[1] William S. Isaac,et al. Power to the People? Opportunities and Challenges for Participatory AI , 2022, EAAMO.
[2] Eric Michael Smith,et al. BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage , 2022, ArXiv.
[3] Raphael Gontijo Lopes,et al. Language Model Cascades , 2022, ArXiv.
[4] Yuhuai Wu,et al. Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.
[5] Lisa Anne Hendricks,et al. Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models , 2022, ArXiv.
[6] Jeff Wu,et al. Self-critiquing models for assisting human evaluators , 2022, ArXiv.
[7] Majid Yazdani,et al. Policy Compliance Detection via Expression Tree Inference , 2022, ArXiv.
[8] Cyprien de Masson d'Autume,et al. StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models , 2022, ICML.
[9] I. Higgins,et al. Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning , 2022, ICLR.
[10] Mo Yu,et al. On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models? , 2022, NAACL.
[11] Tom B. Brown,et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.
[12] Nikita Nangia,et al. Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions , 2022, LNLS.
[13] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.
[14] J. Weston,et al. Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion , 2022, EMNLP.
[15] Jacob Menick,et al. Teaching language models to support answers with verified quotes , 2022, ArXiv.
[16] Angeliki Lazaridou,et al. Internet-augmented language models through few-shot prompting for open-domain question answering , 2022, ArXiv.
[17] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[18] Geoffrey Irving,et al. Red Teaming Language Models with Language Models , 2022, EMNLP.
[19] Renelito Delos Santos,et al. LaMDA: Language Models for Dialog Applications , 2022, ArXiv.
[20] Diego de Las Casas,et al. Improving language models by retrieving from trillions of tokens , 2021, ICML.
[21] Phu Mon Htut,et al. BBQ: A hand-built bias benchmark for question answering , 2021, FINDINGS.
[22] Owain Evans,et al. TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.
[23] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[24] Laura Forlano,et al. Participation Is not a Design Fix for Machine Learning , 2020, EAAMO.
[25] Marcel van Gerven,et al. Explainable Deep Learning: A Field Guide for the Uninitiated , 2020, J. Artif. Intell. Res..
[26] Jeff Wu,et al. WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.
[27] Po-Sen Huang,et al. Ethical and social risks of harm from Language Models , 2021, ArXiv.
[28] Dario Amodei,et al. A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.
[29] Jason Weston,et al. Reason first, then respond: Modular Generation for Knowledge-infused Dialogue , 2021, EMNLP.
[30] Owain Evans,et al. Truthful AI: Developing and governing AI that does not lie , 2021, ArXiv.
[31] Jan Leike,et al. Recursively Summarizing Books with Human Feedback , 2021, ArXiv.
[32] Po-Sen Huang,et al. Challenges in Detoxifying Language Models , 2021, EMNLP.
[33] Majid Yazdani,et al. Cross-Policy Compliance Detection via Question Answering , 2021, EMNLP.
[34] Matthew Lease,et al. The Psychological Well-Being of Content Moderators: The Emotional Labor of Commercial Moderation and Avenues for Improving Support , 2021, CHI.
[35] Dan Klein,et al. Detoxifying Language Models Risks Marginalizing Minority Voices , 2021, NAACL.
[36] Emily M. Bender,et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.
[37] Jackie Kay,et al. Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer Communities , 2021, AIES.
[38] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.
[39] Dawn Song,et al. Measuring Massive Multitask Language Understanding , 2020, ICLR.
[40] Dawn Song,et al. Aligning AI With Shared Human Values , 2020, ICLR.
[41] Jordan L. Boyd-Graber,et al. Toward Deconfounding the Effect of Entity Demographics for Question Answering Accuracy , 2021, Conference on Empirical Methods in Natural Language Processing.
[42] Max Jaderberg,et al. Open-Ended Learning Leads to Generally Capable Agents , 2021, ArXiv.
[43] Michele Banko,et al. A Unified Taxonomy of Harmful Content , 2020, ALW.
[44] Kris McGuffie,et al. The Radicalization Risks of GPT-3 and Advanced Neural Language Models , 2020, ArXiv.
[45] Ryan J. Lowe,et al. Learning to summarize from human feedback , 2020, NeurIPS 2020.
[46] Emily Denton,et al. Bringing the People Back In: Contesting Benchmark Machine Learning Datasets , 2020, ArXiv.
[47] Percy Liang,et al. Selective Question Answering under Domain Shift , 2020, ACL.
[48] Solon Barocas,et al. Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.
[49] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.
[50] H. Francis Song,et al. A Distributional View on Multi-Objective Policy Optimization , 2020, ICML.
[51] H. Francis Song,et al. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.
[52] R. Geiger,et al. ORES , 2019, Proc. ACM Hum. Comput. Interact..
[53] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[54] Ariel D. Procaccia,et al. WeBuildAI , 2019, Proc. ACM Hum. Comput. Interact..
[55] Emily Ahn,et al. Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts , 2019, EMNLP.
[56] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[57] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[58] Jason Weston,et al. Finding Generalizable Evidence by Learning to Convince Q&A Models , 2019, EMNLP.
[59] Jason Weston,et al. Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.
[60] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[61] Scott A. Hale,et al. Challenges and frontiers in abusive content detection , 2019, Proceedings of the Third Workshop on Abusive Language Online.
[62] Jason Weston,et al. ELI5: Long Form Question Answering , 2019, ACL.
[63] S. Gershman. How to never be wrong , 2018, Psychonomic Bulletin & Review.
[64] Ran El-Yaniv,et al. SelectiveNet: A Deep Neural Network with an Integrated Reject Option , 2019, ICML.
[65] Ziqi Zhang,et al. Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter , 2018, Semantic Web.
[66] Shane Legg,et al. Scalable agent alignment via reward modeling: a research direction , 2018, ArXiv.
[67] Dario Amodei,et al. Supervising strong learners by amplifying weak experts , 2018, ArXiv.
[68] Dario Amodei,et al. AI safety via debate , 2018, ArXiv.
[69] Matthew Lease,et al. But Who Protects the Moderators? The Case of Crowdsourced Image Moderation , 2018, 1804.10999.
[70] Rachel Rudinger,et al. Gender Bias in Coreference Resolution , 2018, NAACL.
[71] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[72] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[73] Ran El-Yaniv,et al. Selective Classification for Deep Neural Networks , 2017, NIPS.
[74] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.
[75] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.
[76] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[77] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[78] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[79] Jordan L. Boyd-Graber,et al. Besting the Quiz Master: Crowdsourcing Incremental Classification Games , 2012, EMNLP.
[80] Mary Ann Mason,et al. Keeping Women in the Science Pipeline , 2011 .
[81] Klaus Krippendorff,et al. Computing Krippendorff's Alpha-Reliability , 2011 .
[82] Ran El-Yaniv,et al. On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..
[83] Siobhan Chapman. Logic and Conversation , 2005 .
[84] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[85] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[86] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .
[87] Z. Kunda,et al. The case for motivated reasoning. , 1990, Psychological bulletin.
[88] A. Elo. The rating of chessplayers, past and present , 1978 .
[89] R. A. Bradley,et al. RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .