Direct Preference Optimization: Your Language Model is Secretly a Reward Model
暂无分享,去创建一个
Christopher D. Manning | Archit Sharma | Chelsea Finn | S. Ermon | Rafael Rafailov | E. Mitchell | Stefano Ermon
暂无分享,去创建一个
Christopher D. Manning | Archit Sharma | Chelsea Finn | S. Ermon | Rafael Rafailov | E. Mitchell | Stefano Ermon