Data extraction methods for systematic review (semi)automation: A living review protocol

Background: Researchers in evidence-based medicine cannot keep up with the amounts of both old and newly published primary research articles. Conducting and updating of systematic reviews is time-consuming. In practice, data extraction is one of the most complex tasks in this process. Exponential improvements in computational processing speed and data storage are fostering the development of data extraction models and algorithms. This, in combination with quicker pathways to publication, led to a large landscape of tools and methods for data extraction tasks. Objective: To review published methods and tools for data extraction to (semi)automate the systematic reviewing process. Methods: We propose to conduct a living review. With this methodology we aim to do monthly search updates, as well as bi-annual review updates if new evidence permits it. In a cross-sectional analysis we will extract methodological characteristics and assess the quality of reporting in our included papers. Conclusions: We aim to increase transparency in the reporting and assessment of machine learning technologies to the benefit of data scientists, systematic reviewers and funders of health research. This living review will help to reduce duplicate efforts by data scientists who develop data extraction methods. It will also serve to inform systematic reviewers about possibilities to support their data extraction.