Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?

Systematic reviews are comprehensive reviews of the literature for a highly focused research question. These reviews are often treated as the highest form of evidence in evidence-based medicine, and are the key strategy to answer research questions in the medical field. To create a high-quality systematic review, complex Boolean queries are often constructed to retrieve studies for the review topic. However, it often takes a long time for systematic review researchers to construct a high quality systematic review Boolean query, and often the resulting queries are far from effective. Poor queries may lead to biased or invalid reviews, because they missed to retrieve key evidence, or to extensive increase in review costs, because they retrieved too many irrelevant studies. Recent advances in Transformer-based generative models have shown great potential to effectively follow instructions from users and generate answers based on the instructions being made. In this paper, we investigate the effectiveness of the latest of such models, ChatGPT, in generating effective Boolean queries for systematic review literature search. Through a number of extensive experiments on standard test collections for the task, we find that ChatGPT is capable of generating queries that lead to high search precision, although trading-off this for recall. Overall, our study demonstrates the potential of ChatGPT in generating effective Boolean queries for systematic review literature search. The ability of ChatGPT to follow complex instructions and generate queries with high precision makes it a valuable tool for researchers conducting systematic reviews, particularly for rapid reviews where time is a constraint and often trading-off higher precision for lower recall is acceptable.

[1]  E.C. Garrido-Merchán,et al.  ChatGPT is not all you need. A State of the Art Review of large Generative AI models , 2023, ArXiv.

[2]  Hang Li,et al.  MeSH Suggester: A Library and System for MeSH Term Suggestion for Systematic Review Boolean Query Construction , 2022, Web Search and Data Mining.

[3]  F. Ensan,et al.  Towards semantic-driven boolean query formalization for biomedical systematic literature reviews , 2022, Int. J. Medical Informatics.

[4]  Shenmin Zhang,et al.  BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining , 2022, Briefings Bioinform..

[5]  B. Koopman,et al.  Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search , 2022, Intell. Syst. Appl..

[6]  B. Koopman,et al.  The Impact of Query Refinement on Systematic Review Literature Search: A Query Log Analysis , 2022, ICTIR.

[7]  O. Winther,et al.  Can large language models reason about medical questions? , 2022, Patterns.

[8]  J. Oppenlaender A Taxonomy of Prompt Modifiers for Text-To-Image Generation , 2022, 2204.13988.

[9]  B. Koopman,et al.  From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search , 2022, SIGIR.

[10]  Tony Russell-Rose,et al.  Search Strategy Formulation for Systematic Reviews: issues, challenges and opportunities , 2021, Intell. Syst. Appl..

[11]  Lydia B. Chilton,et al.  Design Guidelines for Prompt Engineering Text-to-Image Generative Models , 2021, CHI.

[12]  Jianfeng Gao,et al.  Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing , 2020, ACM Trans. Comput. Heal..

[13]  B. Benatallah,et al.  Systematic Literature Review Search Query Refinement Pipeline: Incremental Enrichment and Adaptation , 2022, International Conference on Advanced Information Systems Engineering.

[14]  Wayne Xin Zhao,et al.  A Survey of Pretrained Language Models Based Text Generation , 2022, ArXiv.

[15]  V. Claveau Neural text generation for query expansion in information retrieval , 2021, WI/IAT.

[16]  Guido Zuccon,et al.  MeSH Term Suggestion for Systematic Review Literature Search , 2021, ADCS.

[17]  Laria Reynolds,et al.  Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm , 2021, CHI Extended Abstracts.

[18]  Guido Zuccon,et al.  A comparison of automatic Boolean query formulation for systematic reviews , 2020, Information Retrieval Journal.

[19]  Guido Zuccon,et al.  Systematic Review Automation Tools for End-to-End Query Formulation , 2020, SIGIR.

[20]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[21]  Justin Clark,et al.  A full systematic review was completed in 2 weeks using automation tools: a case study , 2020 .

[22]  Guido Zuccon,et al.  Sampling Query Variations for Learning to Rank to Improve Automatic Boolean Query Generation in Systematic Reviews , 2020, WWW.

[23]  Guido Zuccon,et al.  Automatic Boolean Query Formulation for Systematic Review Literature Search , 2020, WWW.

[24]  Margie Wallin,et al.  Improving the translation of search strategies using the Polyglot Search Translator: a randomized controlled trial , 2020, Journal of the Medical Library Association : JMLA.

[25]  Guido Zuccon,et al.  A Computational Approach for Objectively Derived Systematic Review Search Strategies , 2020, ECIR.

[26]  Jimmy J. Lin,et al.  Multi-Stage Document Ranking with BERT , 2019, ArXiv.

[27]  Guido Zuccon,et al.  Automatic Boolean Query Refinement for Systematic Review Literature Search , 2019, WWW.

[28]  Byron C. Wallace,et al.  Rapid reviews may produce different results to systematic reviews: a meta-epidemiological study , 2019, Journal of clinical epidemiology.

[29]  Rosario Arquero-Avilés,et al.  Errors in search strategies used in systematic reviews and their effects on information retrieval , 2019, Journal of the Medical Library Association : JMLA.

[30]  Leif Azzopardi,et al.  CLEF 2019 Technology Assisted Reviews in Empirical Medicine Overview , 2019, CLEF.

[31]  Guido Zuccon,et al.  searchrefiner: A Query Visualisation and Understanding Tool for Systematic Reviews , 2018, CIKM.

[32]  Tove Faber Frandsen,et al.  The impact of patient, intervention, comparison, outcome (PICO) as a search strategy tool on literature search quality: a systematic review , 2018, Journal of the Medical Library Association : JMLA.

[33]  Mark Stevenson,et al.  Retrieving and Ranking Studies for Systematic Reviews: University of Sheffield's Approach to CLEF eHealth 2018 Task 2 , 2018, CLEF.

[34]  Tony Russell-Rose,et al.  2dSearch: A Visual Approach to Search Strategy Formulation , 2018, DESIRES.

[35]  Guido Zuccon,et al.  Generating Better Queries for Systematic Reviews , 2018, SIGIR.

[36]  Manpreet Kaur,et al.  Neural ParsCit: a deep learning-based reference string parser , 2018, International Journal on Digital Libraries.

[37]  Byron C. Wallace,et al.  Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide , 2018, Research synthesis methods.

[38]  Neil R. Smalheiser,et al.  OHSU CLEF 2018 Task 2 Diagnostic Test Accuracy Ranking using Publication Type Cluster Similarity Measures , 2018, CLEF.

[39]  Aurélie Névéol,et al.  LIMSI@CLEF eHealth 2018 Task 2: Technology Assisted Reviews by Stacking Active and Static Learning , 2018, CLEF.

[40]  Grigorios Tsoumakas,et al.  Aristotle University's Approach to the Technologically Assisted Reviews in Empirical Medicine Task of the 2018 CLEF eHealth Lab , 2018, CLEF.

[41]  Giorgio Maria Di Nunzio,et al.  Interactive Sampling for Systematic Reviews. IMS Unipd At CLEF 2018 eHealth Task 2 , 2018, CLEF.

[42]  Qinmin Hu,et al.  ECNU at 2018 eHealth Task 2: Technologically Assisted Reviews in Empirical Medicine , 2018, CLEF.

[43]  Leif Azzopardi,et al.  CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview , 2018, CLEF.

[44]  Mark Stevenson,et al.  Ranking Abstracts to Identify Relevant Evidence for Systematic Reviews: The University of Sheffield's Approach to CLEF eHealth 2017 Task 2 , 2017, CLEF.

[45]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[46]  Clive E. Adams,et al.  RevManHAL: towards automatic text generation in systematic reviews , 2017, Systematic Reviews.

[47]  Guido Zuccon,et al.  QUT ielab at CLEF eHealth 2017 Technology Assisted Reviews Track: Initial experiments with learning to rank , 2017, CLEF 2017.

[48]  John Rathbone,et al.  Automating systematic reviews. , 2017 .

[49]  Grace Eunkyung Lee,et al.  A Study of Convolutional Neural Networks for Clinical Document Classification in Systematic Reviews: SysReview at CLEF eHealth 2017 , 2017, CLEF.

[50]  Grigorios Tsoumakas,et al.  Combining Inter-Review Learning-to-Rank and Intra-Review Incremental Training for Title and Abstract Screening in Systematic Reviews , 2017, CLEF.

[51]  Giorgio Maria Di Nunzio,et al.  An Interactive Two-Dimensional Approach to Query Aspects Rewriting in Systematic Reviews. IMS Unipd At CLEF eHealth Task 2 , 2017, CLEF.

[52]  Tim Menzies,et al.  Data Balancing for Technologically Assisted Reviews: Undersampling or Reweighting , 2017, CLEF.

[53]  Gaurav Singh,et al.  Identifying Diagnostic Test Accuracy Publications using a Deep Model , 2017, CLEF.

[54]  Carsten Eickhoff,et al.  Ranking and Feedback-based Stopping for Recall-Centric Document Retrieval , 2017, CLEF.

[55]  Aurélie Névéol,et al.  LIMSI@CLEF eHealth 2017 Task 2: Logistic Regression for Automatic Article Ranking , 2017, CLEF.

[56]  Jaspreet Singh,et al.  IIIT-H at CLEF eHealth 2017 Task 2: Technologically Assisted Reviews in Empirical Medicine , 2017, CLEF.

[57]  Prem Timsina,et al.  Advanced analytics for the automation of medical systematic reviews , 2016, Inf. Syst. Frontiers.

[58]  Siw Waffenschmidt,et al.  Development of search strategies for systematic reviews: validation showed the noninferiority of the objective approach. , 2015, Journal of clinical epidemiology.

[59]  Sophia Ananiadou,et al.  Reducing systematic review workload through certainty-based screening , 2014, J. Biomed. Informatics.

[60]  Juan Jose García Adeva,et al.  Automatic text classification to support systematic reviews in medicine , 2014, Expert Syst. Appl..

[61]  Claire Stansfield,et al.  ‘Clustering’ documents automatically to support scoping reviews of research: a case study , 2013, Research synthesis methods.

[62]  Kathi Canese,et al.  PubMed: The Bibliographic Database , 2013 .

[63]  Thomas Kaiser,et al.  Routine development of objectively derived search strategies , 2012, Systematic Reviews.

[64]  Ricky K. Taira,et al.  Automated Extraction of Reported Statistical Analyses: Towards a Logical Representation of Clinical Trial Literature , 2012, AMIA.

[65]  Shlomo Argamon,et al.  Automatic Summarization of Results from Clinical Trials , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[66]  David Martínez,et al.  Automatic classification of sentences to support Evidence Based Medicine , 2011, BMC Bioinformatics.

[67]  Elke Hausner,et al.  Identifying nurse staffing research in Medline: development and testing of empirically derived search strategies with the PubMed interface , 2010, BMC medical research methodology.

[68]  Jian-Yun Nie,et al.  Clinical Information Retrieval using Document and PICO Structure , 2010, NAACL.

[69]  Falk Scholer,et al.  The challenge of high recall in biomedical systematic search , 2009, DTMBIO.

[70]  Raul Rodriguez-Esteban,et al.  Figure mining for biomedical research , 2009, Bioinform..

[71]  Shlomo Argamon,et al.  Identifying treatments, groups and outcomes in medical abstracts , 2009 .

[72]  A. Harden,et al.  Methods for the thematic synthesis of qualitative research in systematic reviews , 2008, BMC medical research methodology.

[73]  Timothy Baldwin,et al.  Facilitating biomedical systematic reviews using ranked text retrieval and classification , 2008 .

[74]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[75]  J. McGowan,et al.  Errors in search strategies were identified by type and frequency. , 2006, Journal of clinical epidemiology.

[76]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[77]  T. Greenhalgh,et al.  Effectiveness and efficiency of search methods in systematic reviews of complex evidence: audit of primary sources , 2005, BMJ : British Medical Journal.

[78]  Scott Reeves,et al.  Twelve tips for undertaking a systematic review , 2002, Medical teacher.