OpenAssistant Conversations - Democratizing Large Language Model Alignment

Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and has driven rapid adoption as demonstrated by ChatGPT. Alignment techniques such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) greatly reduce the required skill and domain knowledge to effectively harness the capabilities of LLMs, increasing their accessibility and utility across various domains. However, state-of-the-art alignment techniques like RLHF rely on high-quality human feedback data, which is expensive to create and often remains proprietary. In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations, a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages distributed across 66,497 conversation trees, in 35 different languages, annotated with 461,292 quality ratings. The corpus is a product of a worldwide crowd-sourcing effort involving over 13,500 volunteers. To demonstrate the OpenAssistant Conversations dataset's effectiveness, we present OpenAssistant, the first fully open-source large-scale instruction-tuned model to be trained on human data. A preference study revealed that OpenAssistant replies are comparably preferred to GPT-3.5-turbo (ChatGPT) with a relative winrate of 48.3% vs. 51.7% respectively. We release our code and data under fully permissive licenses.

[1]  Chunyuan Li,et al.  Instruction Tuning with GPT-4 , 2023, ArXiv.

[2]  Oskar van der Wal,et al.  Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , 2023, ArXiv.

[3]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[4]  Noah A. Smith,et al.  Self-Instruct: Aligning Language Model with Self Generated Instructions , 2022, ArXiv.

[5]  Tom B. Brown,et al.  Discovering Language Model Behaviors with Model-Written Evaluations , 2022, ACL.

[6]  Lisa Anne Hendricks,et al.  Improving alignment of dialogue agents via targeted human judgements , 2022, ArXiv.

[7]  Felix Naumann,et al.  The Effects of Data Quality on Machine Learning Performance , 2022, 2207.14529.

[8]  Tom B. Brown,et al.  Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.

[9]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[10]  Po-Sen Huang,et al.  Ethical and social risks of harm from Language Models , 2021, ArXiv.

[11]  Dario Amodei,et al.  A General Language Assistant as a Laboratory for Alignment , 2021, ArXiv.

[12]  Jason Weston,et al.  Retrieval Augmentation Reduces Hallucination in Conversation , 2021, EMNLP.

[13]  Ryan J. Lowe,et al.  Learning to summarize from human feedback , 2020, NeurIPS 2020.

[14]  Richard Socher,et al.  Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.

[15]  Lucy Vasserman,et al.  Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.

[16]  Peter Henderson,et al.  Ethical Challenges in Data-Driven Dialogue Systems , 2017, AIES.

[17]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[18]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[19]  Marco Valtorta,et al.  The Effects of Data Quality on Machine Learning Algorithms , 2006, ICIQ.

[20]  Jürgen H. P. Hoffmeyer-Zlotnik,et al.  How to measure education in cross-national comparison: Hoffmeyer-Zlotnik/ Warner-Matrix of Education as a new instrument , 2005 .

[21]  Herbert H. Clark,et al.  Grounding in communication , 1991, Perspectives on socially shared cognition.

[22]  T. Tideman,et al.  Independence of clones as a criterion for voting rules , 1987 .