RoCar: A Relationship Network-based Evaluation Method to Large Language Models
暂无分享,去创建一个
Daling Wang | Yifei Zhang | Shi Feng | Ming Wang | Wenfang Wu | Chongyun Gao | Chongyun Gao | Wenfang Wu
[1] Eric Michael Smith,et al. Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.
[2] Jindong Wang,et al. A Survey on Evaluation of Large Language Models , 2023, ArXiv.
[3] Li Yuan,et al. ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases , 2023, ArXiv.
[4] Liang He,et al. Evaluating the Performance of Large Language Models on GAOKAO Benchmark , 2023, ArXiv.
[5] Maosong Sun,et al. C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models , 2023, NeurIPS.
[6] Hao Sun,et al. Safety Assessment of Chinese Large Language Models , 2023, ArXiv.
[7] Ting Liu,et al. HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge , 2023, ArXiv.
[8] Weizhu Chen,et al. AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models , 2023, ArXiv.
[9] Honglin Xiong,et al. DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task , 2023, ArXiv.
[10] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[11] P. Zhang,et al. GLM-130B: An Open Bilingual Pre-trained Model , 2022, ICLR.
[12] Zhilin Yang,et al. GLM: General Language Model Pretraining with Autoregressive Blank Infilling , 2021, ACL.
[13] A. Hood,et al. Gender , 2019, Textile History.
[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.