Heterogeneous Value Evaluation for Large Language Models
暂无分享,去创建一个
Yaodong Yang | Siyuan Qi | N. Liu | Shuguang Cui | Zhaowei Zhang | Ceyao Zhang | Ziqi Rong
[1] C. Summerfield,et al. Using the Veil of Ignorance to align AI systems with principles of justice , 2023, Proceedings of the National Academy of Sciences of the United States of America.
[2] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.
[3] Henrique Pondé de Oliveira Pinto,et al. GPT-4 Technical Report , 2023, 2303.08774.
[4] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[5] Noah A. Smith,et al. Self-Instruct: Aligning Language Models with Self-Generated Instructions , 2022, ACL.
[6] Tom B. Brown,et al. Constitutional AI: Harmlessness from AI Feedback , 2022, ArXiv.
[7] Y. Wu,et al. In situ bidirectional human-robot value alignment , 2022, Sci. Robotics.
[8] Thilo Hagendorff. A Virtue-Based Framework to Support Putting AI Ethics into Practice , 2022, Philosophy & Technology.
[9] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[10] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[11] Scott Niekum,et al. Value Alignment Verification , 2020, ICML.
[12] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[13] Joel Z. Leibo,et al. Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning , 2020, AAMAS.
[14] Stuart Russell. Human Compatible: Artificial Intelligence and the Problem of Control , 2019 .
[15] William A. Bauer. Virtuous vs. utilitarian artificial moral agents , 2018, AI & SOCIETY.
[16] Malte Risto,et al. The social behavior of autonomous vehicles , 2016, UbiComp Adjunct.
[17] Ryan O. Murphy,et al. Measuring Social Value Orientation , 2011, SSRN Electronic Journal.
[18] Edwin A. Locke,et al. Job satisfaction and job performance: A theoretical analysis , 1970 .
[19] H. Simon,et al. A Behavioral Model of Rational Choice , 1955 .
[20] Song-Chun Zhu,et al. MPI: Evaluating and Inducing Personality in Pre-trained Language Models , 2022, ArXiv.