HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
暂无分享,去创建一个
[1] Wayne Xin Zhao,et al. Evaluating Object Hallucination in Large Vision-Language Models , 2023, ArXiv.
[2] Wayne Xin Zhao,et al. StructGPT: A General Framework for Large Language Model to Reason over Structured Data , 2023, ArXiv.
[3] Jie Huang,et al. Why Does ChatGPT Fall Short in Answering Questions Faithfully? , 2023, ArXiv.
[4] Wayne Xin Zhao,et al. A Survey of Large Language Models , 2023, ArXiv.
[5] M. Gales,et al. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models , 2023, ArXiv.
[6] Dan Su,et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity , 2023, IJCNLP.
[7] Pascale Fung,et al. Survey of Hallucination in Natural Language Generation , 2022, ACM Comput. Surv..
[8] R. Srihari,et al. Diving Deep into Modes of Fact Hallucinations in Dialogue Systems , 2023, EMNLP.
[9] Siva Reddy,et al. FaithDial: A Faithful Benchmark for Information-Seeking Dialogue , 2022, Transactions of the Association for Computational Linguistics.
[10] Y. Matias,et al. TRUE: Re-evaluating Factual Consistency Evaluation , 2022, NAACL.
[11] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[12] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[13] Wenhao Liu,et al. DialFact: A Benchmark for Fact-Checking in Dialogue , 2021, ACL.
[14] Jackie Chi Kit Cheung,et al. Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization , 2021, ACL.
[15] D. Reitter,et al. Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark , 2021, Transactions of the Association for Computational Linguistics.
[16] Gaurav Singh Tomar,et al. Measuring Attribution in Natural Language Generation Models , 2021, Computational Linguistics.
[17] Andrea Madotto,et al. Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding , 2021, EMNLP.
[18] Shay B. Cohen,et al. Reducing the Frequency of Hallucinated Quantities in Abstractive Summaries , 2020, FINDINGS.
[19] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[20] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.
[21] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[22] Seungwhan Moon,et al. OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs , 2019, ACL.
[23] Ankur Parikh,et al. Handling Divergent Reference Texts when Evaluating Table-to-Text Generation , 2019, ACL.
[24] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[25] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[26] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..