Using text mining for study identification in systematic reviews: a systematic review of current approaches

BackgroundThe large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities.MethodsFive research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged?We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings.ResultsThe evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable.On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall).ConclusionsUsing text mining to prioritise the order in which items are screened should be considered safe and ready for use in ‘live’ reviews. The use of text mining as a ‘second screener’ may also be used cautiously. The use of text mining to eliminate studies automatically should be considered promising, but not yet fully proven. In highly technical/clinical areas, it may be used with a high degree of confidence; but more developmental and evaluative work is needed in other disciplines.

[1]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[2]  C. V. Ramamoorthy,et al.  Knowledge and Data Engineering , 1989, IEEE Trans. Knowl. Data Eng..

[3]  James Parker,et al.  on Knowledge and Data Engineering, , 1990 .

[4]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[5]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[6]  G. Peersman,et al.  Identifying primary research on electronic databases to inform decision-making in health promotion: the case of sexual health promotion , 1999 .

[7]  I. Olkin,et al.  Estimating time to conduct a meta-analysis from number of citations retrieved. , 1999, JAMA.

[8]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[9]  L. Hedges,et al.  A Brief History of Research Synthesis , 2002, Evaluation & the health professions.

[10]  D. Gough,et al.  Systematic Research Synthesis to Inform Policy, Practice and Democratic Debate , 2002, Social Policy and Society.

[11]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[12]  Guy M. Goodwin,et al.  Introduction to Systematic Reviews , 2004, Journal of psychopharmacology.

[13]  Dragos D. Margineantu,et al.  Active Cost-Sensitive Learning , 2005, IJCAI.

[14]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[15]  Aaron M. Cohen,et al.  An Effective General Purpose Approach for Automated Biomedical Document Classification , 2006, AMIA.

[16]  David Moher,et al.  Can electronic search engines optimize screening of search results in systematic reviews: an empirical study , 2006, BMC medical research methodology.

[17]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[18]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[19]  Sophia Ananiadou,et al.  Supporting Systematic Reviews Using Text Mining , 2009 .

[20]  James Thomas,et al.  EPPI-Reviewer 3.5: software for research synthesis , 2007 .

[21]  Muin J. Khoury,et al.  GAPscreener: An automatic tool for screening human genetic association literature in PubMed using the support vector machine technique , 2008, BMC Bioinformatics.

[22]  Manoel G. Mendonça,et al.  A Visual Text Mining approach for Systematic Reviews , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[23]  Aaron M. Cohen,et al.  Optimizing Feature Representation for Automated Systematic Review Work Prioritization , 2008, AMIA.

[24]  Bruce E. Bray,et al.  Semantic Processing to Support Clinical Guideline Development , 2008, AMIA.

[25]  David Hailey,et al.  Rapid reviews versus full systematic reviews: An inventory of current methods and practice in health technology assessment , 2008, International Journal of Technology Assessment in Health Care.

[26]  Timothy Baldwin,et al.  Facilitating biomedical systematic reviews using text classification and ranked retrieval , 2008 .

[27]  Timothy Baldwin,et al.  Facilitating biomedical systematic reviews using ranked text retrieval and classification , 2008 .

[28]  Yutaka Sasaki Automatic Text Classification , 2008 .

[29]  Carla E. Brodley,et al.  Semi-automated screening of biomedical citations for systematic reviews , 2010, BMC Bioinformatics.

[30]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[31]  Stan Matwin,et al.  Parameterized Contrast in Second Order Soft Co-occurrences: A Novel Text Representation Technique in Text Mining and Knowledge Extraction , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[32]  Stan Matwin,et al.  Classifying Biomedical Abstracts Using Committees of Classifiers and Collective Ranking Techniques , 2009, Canadian Conference on AI.

[33]  Aaron M. Cohen,et al.  Research Paper: Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update , 2009, J. Am. Medical Informatics Assoc..

[34]  K. Bretonnel Cohen,et al.  The structural and content aspects of abstracts versus bodies of full text journal articles are different , 2010, BMC Bioinformatics.

[35]  Joel D. Martin,et al.  ExaCT: automatic extraction of clinical trial characteristics from journal publications , 2010, BMC Medical Informatics Decis. Mak..

[36]  Carla E. Brodley,et al.  Modeling annotation time to reduce workload in comparative effectiveness reviews , 2010, IHI.

[37]  Stan Matwin,et al.  A new algorithm for reducing the workload of experts in performing systematic reviews , 2010, J. Am. Medical Informatics Assoc..

[38]  Marian McDonagh,et al.  A Prospective Evaluation of an Automated Classification System to Support Evidence-based Medicine and Systematic Review. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[39]  Halil Kilicoglu,et al.  Combining Relevance Assignment with Quality of the Evidence to Support Guideline Development , 2010, MedInfo.

[40]  Michele Tarsilla Cochrane Handbook for Systematic Reviews of Interventions , 2010, Journal of MultiDisciplinary Evaluation.

[41]  Nathalie Japkowicz,et al.  Using Classifier Performance Visualization to Improve Collective Ranking Techniques for Biomedical Abstracts Classification , 2010, Canadian Conference on AI.

[42]  H. Bastian,et al.  Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? , 2010, PLoS medicine.

[43]  Dina Demner-Fushman,et al.  Towards Automating the Initial Screening Phase of a Systematic Review , 2010, MedInfo.

[44]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[45]  Stan Matwin,et al.  Building Systematic Reviews Using Automatic Text Classification Techniques , 2010, COLING.

[46]  Catherine Blake,et al.  Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles , 2010, J. Biomed. Informatics.

[47]  Carla E. Brodley,et al.  Active learning for biomedical citation screening , 2010, KDD.

[48]  Laura A. Levit,et al.  Finding what works in health care : standards for systematic reviews , 2011 .

[49]  Carla E. Brodley,et al.  The Constrained Weight Space SVM: Learning with Ranked Features , 2011, ICML.

[50]  Stan Matwin,et al.  Exploiting the systematic review protocol for classification of medical abstracts , 2011, Artif. Intell. Medicine.

[51]  M. Broder,et al.  Gastrointestinal neuroendocrine tumors treated with high dose octreotide-LAR: a systematic literature review. , 2015, World journal of gastroenterology.

[52]  Christine Urquhart,et al.  Precision of healthcare systematic review searches in a cross‐sectional sample , 2011, Research synthesis methods.

[53]  Peter A. Flach,et al.  Proceedings of the 28th International Conference on Machine Learning , 2011 .

[54]  Aaron M. Cohen,et al.  Letter: Performance of support-vector-machine-based classification on 15 systematic review topics evaluated with the WSS@95 measure , 2011, J. Am. Medical Informatics Assoc..

[55]  Carla E. Brodley,et al.  Who Should Label What? Instance Allocation in Multiple Expert Active Learning , 2011, SDM.

[56]  Stan Matwin,et al.  Letter: Performance of SVM and Bayesian classifiers on the systematic review classification task , 2011, J. Am. Medical Informatics Assoc..

[57]  Emilia Mendes,et al.  Using Visual Text Mining to Support the Study Selection Activity in Systematic Literature Reviews , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[58]  Luca Ardito,et al.  Linked data approach for selection process automation in systematic reviews , 2011, EASE.

[59]  Alison O'Mara-Eves,et al.  How can we find relevant research more quickly , 2011 .

[60]  Sophia Ananiadou,et al.  Applications of text mining within systematic reviews , 2011, Research synthesis methods.

[61]  D. Gough,et al.  Clarifying differences between review designs and methods , 2012, Systematic Reviews.

[62]  He Zhang,et al.  Towards evidence-based ontology for supporting Systematic Literature Review , 2012, EASE.

[63]  Carla E. Brodley,et al.  Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining , 2012, Genetics in Medicine.

[64]  Borim Ryu,et al.  Combining relevancy and methodological quality into a single ranking for evidence-based medicine , 2012, Inf. Sci..

[65]  Aaron M. Cohen,et al.  Studying the potential impact of automated document classification on scheduling a systematic review update , 2012, BMC Medical Informatics and Decision Making.

[66]  Seunghee Kim,et al.  Improving the Performance of Text Categorization Models used for the Selection of High Quality Articles , 2012, Healthcare informatics research.

[67]  Carla E. Brodley,et al.  Deploying an interactive machine learning system in an evidence-based practice center: abstrackr , 2012, IHI '12.

[68]  Dina Demner-Fushman,et al.  Screening nonrandomized studies for medical systematic reviews: A comparative study of classifiers , 2012, Artif. Intell. Medicine.

[69]  Stan Matwin,et al.  Direct comparison between support vector machine and multinomial naive Bayes algorithms for medical abstract classification , 2012, J. Am. Medical Informatics Assoc..

[70]  Ruth L. Okediji,et al.  When Copyright Law and Science Collide: Empowering Digitally Integrated Research Methods on a Global Scale , 2012, Minnesota law review.

[71]  Rosane Minghim,et al.  A visual analysis approach to validate the selection review of primary studies in systematic reviews , 2012, Inf. Softw. Technol..

[72]  Enrico Coiera,et al.  The automation of systematic reviews , 2013, BMJ.

[73]  Susanne Hempel,et al.  A Pilot Study Using Machine Learning and Domain Knowledge to Facilitate Comparative Effectiveness Review Updating , 2013, Medical decision making : an international journal of the Society for Medical Decision Making.

[74]  Simone R. S. Souza,et al.  The Use of Visual Text Mining to Support the Study Selection Activity in Systematic Literature Reviews: A Replication Study , 2013, 2013 3rd International Workshop on Replication in Empirical Software Engineering Research.

[75]  Rodney L. Summerscales,et al.  AUTOMATIC SUMMARIZATION OF CLINICAL ABSTRACTS FOR EVIDENCE-BASED MEDICINE , 2013 .

[76]  Dazhe Zhao,et al.  An Optimized Cost-Sensitive SVM for Imbalanced Data Learning , 2013, PAKDD.

[77]  Siddhartha Jonnalagadda,et al.  A new iterative method to reduce workload in systematic review process , 2013, Int. J. Comput. Biol. Drug Des..

[78]  Jm Thomas,et al.  Diffusion of innovation in systematic review methodology: why is study selection not yet assisted by automation? , 2013 .

[79]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[80]  S. Shenkin,et al.  Lifestyle intervention for improving school achievement in overweight or obese children and adolescents. , 2014, The Cochrane database of systematic reviews.

[81]  Alireza Sarveniazi An Actual Survey of Dimensionality Reduction , 2014 .

[82]  Dina Demner-Fushman,et al.  Feature Engineering and a Proposed Decision-Support System for Systematic Reviewers of Medical Evidence , 2014, PloS one.

[83]  Helen McConachie,et al.  Interventions based on the Theory of Mind cognitive model for autism spectrum disorder (ASD). , 2014, The Cochrane database of systematic reviews.

[84]  P. Glasziou,et al.  Systematic review automation technologies , 2014, Systematic Reviews.

[85]  Sophia Ananiadou,et al.  Reducing systematic review workload through certainty-based screening , 2014, J. Biomed. Informatics.

[86]  J. Verbeek,et al.  Devices for preventing percutaneous exposure injuries caused by needles in healthcare personnel. , 2014, The Cochrane database of systematic reviews.

[87]  Juan Jose García Adeva,et al.  Automatic text classification to support systematic reviews in medicine , 2014, Expert Syst. Appl..

[88]  Patrick Van Eecke,et al.  Legal aspects of text mining , 2014, LREC.

[89]  J. Verbeek,et al.  Gloves, extra gloves or special types of gloves for preventing percutaneous exposure injuries in healthcare personnel. , 2014, The Cochrane database of systematic reviews.

[90]  David Ogilvie,et al.  Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews , 2014, Research synthesis methods.

[91]  José Salvador Sánchez,et al.  A bias correction function for classification performance assessment in two-class imbalanced problems , 2014, Knowl. Based Syst..

[92]  Sophia Ananiadou,et al.  Erratum to: Using text mining for study identification in systematic reviews: a systematic review of current approaches , 2015, Systematic Reviews.

[93]  Byron C. Wallace,et al.  Automating Risk of Bias Assessment for Clinical Trials , 2014, IEEE Journal of Biomedical and Health Informatics.

[94]  Devices for preventing percutaneous exposure injuries caused by needles in healthcare personnel. , 2014, The Cochrane database of systematic reviews.

[95]  Nila A Sathe,et al.  Searching for studies: a guide to information retrieval for Campbell systematic reviews , 2017 .