Towards Expert-Level Medical Question Answering with Large Language Models
暂无分享,去创建一个
Vivek Natarajan | G. Corrado | D. Webster | Yun Liu | B. A. Y. Arcas | K. Singhal | Sushant Prakash | Nenad Tomašev | S. Pfohl | R. Sayres | Ellery Wulczyn | Ewa Dominowska | M. Schaekermann | H. Cole-Lewis | S. S. Mahdavi | Kevin Clark | Christopher Semturs | Shekoofeh Azizi | Y. Matias | Tao Tu | Bradley Green | J. Barral | Mohamed Amin | S. Lachgar | Le Hou | P. A. Mansfield | Juraj Gottweis | Darlene Neal | Amy Wang | Renee C Wong | A. Karthikesalingam | Yossi Matias
[1] W. Lee,et al. ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models , 2023, Annals of surgical treatment and research.
[2] J. Ayers,et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. , 2023, JAMA internal medicine.
[3] Bodhisattwa Prasad Majumder,et al. Self-Refine: Iterative Refinement with Self-Feedback , 2023, 2303.17651.
[4] E. Horvitz,et al. Capabilities of GPT-4 on Medical Challenge Problems , 2023, ArXiv.
[5] D. Levine,et al. The Diagnostic and Triage Accuracy of the GPT-3 Artificial Intelligence Model , 2023, medRxiv.
[6] D. Duong,et al. Analysis of large-language model versus human performance for genetics questions , 2023, medRxiv.
[7] J. El-Khoury,et al. Evaluating the Performance of ChatGPT in Ophthalmology , 2023, medRxiv.
[8] Hyung Won Chung,et al. Large language models encode clinical knowledge , 2022, Nature.
[9] Luke Zettlemoyer,et al. Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters , 2022, ArXiv.
[10] Viorica Patraucean,et al. Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task , 2022, ArXiv.
[11] Shenmin Zhang,et al. BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining , 2022, Briefings Bioinform..
[12] O. Winther,et al. Can large language models reason about medical questions? , 2022, Patterns.
[13] Yuhuai Wu,et al. Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.
[14] Ankit Pal,et al. MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering , 2022, CHIL.
[15] D. Schuurmans,et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.
[16] Geoffrey Irving,et al. Red Teaming Language Models with Language Models , 2022, EMNLP.
[17] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[18] Jianfeng Gao,et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..
[19] Po-Sen Huang,et al. Ethical and social risks of harm from Language Models , 2021, ArXiv.
[20] Keith C. Norris,et al. Health inequities and the inappropriate use of race in nephrology , 2021, Nature Reviews Nephrology.
[21] N. Powe,et al. New Creatinine- and Cystatin C-Based Equations to Estimate GFR without Race. , 2021, The New England journal of medicine.
[22] Di Jin,et al. What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams , 2020, Applied Sciences.
[23] Marzyeh Ghassemi,et al. Ethical Machine Learning in Health Care , 2020, Annual review of biomedical data science.
[24] Dawn Song,et al. Measuring Massive Multitask Language Understanding , 2020, ICLR.
[25] Ruqaiijah Yearby. Structural Racism and Health Disparities: Reconfiguring the Social Determinants of Health Framework to Include the Root Cause , 2020, Journal of Law, Medicine & Ethics.
[26] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[27] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[28] David R. Williams,et al. Understanding how discrimination can affect health , 2019, Health services research.
[29] William W. Cohen,et al. PubMedQA: A Dataset for Biomedical Research Question Answering , 2019, EMNLP.
[30] Mark Sharp,et al. Bridging the Gap Between Consumers' Medication Questions and Trusted Answers , 2019, MedInfo.
[31] M. Rigby. Ethical Dimensions of Using Artificial Intelligence in Health Care , 2019, AMA Journal of Ethics.
[32] Eugene Agichtein,et al. Overview of the Medical Question Answering Task at TREC 2017 LiveQA , 2017, TREC.
[33] Julie Cerese,et al. The Reliability of AHRQ Common Format Harm Scales in Rating Patient Safety Events , 2015, Journal of patient safety.
[34] Justus J. Randolph. Free-Marginal Multirater Kappa (multirater K[free]): An Alternative to Fleiss' Fixed-Marginal Multirater Kappa. , 2005 .
[35] Peter Szolovits,et al. Categorical and Probabilistic Reasoning in Medicine Revisited , 1993, Artif. Intell..
[36] E. Shortliffe. Computer programs to support clinical decision making. , 1990, JAMA.
[37] William B. Schwartz,et al. Medicine and the Computer. The Promise and Problems of Change , 1970 .