Automatic identification of high impact articles in PubMed to support clinical decision making

OBJECTIVES The practice of evidence-based medicine involves integrating the latest best available evidence into patient care decisions. Yet, critical barriers exist for clinicians' retrieval of evidence that is relevant for a particular patient from primary sources such as randomized controlled trials and meta-analyses. To help address those barriers, we investigated machine learning algorithms that find clinical studies with high clinical impact from PubMed®. METHODS Our machine learning algorithms use a variety of features including bibliometric features (e.g., citation count), social media attention, journal impact factors, and citation metadata. The algorithms were developed and evaluated with a gold standard composed of 502 high impact clinical studies that are referenced in 11 clinical evidence-based guidelines on the treatment of various diseases. We tested the following hypotheses: (1) our high impact classifier outperforms a state-of-the-art classifier based on citation metadata and citation terms, and PubMed's® relevance sort algorithm; and (2) the performance of our high impact classifier does not decrease significantly after removing proprietary features such as citation count. RESULTS The mean top 20 precision of our high impact classifier was 34% versus 11% for the state-of-the-art classifier and 4% for PubMed's® relevance sort (p=0.009); and the performance of our high impact classifier did not decrease significantly after removing proprietary features (mean top 20 precision=34% vs. 36%; p=0.085). CONCLUSION The high impact classifier, using features such as bibliometrics, social media attention and MEDLINE® metadata, outperformed previous approaches and is a promising alternative to identifying high impact studies for clinical decision support.

[1]  Nguyen Phuoc Long,et al.  Research Trends in Evidence-Based Medicine: A Joinpoint Regression Analysis of More than 50 Years of Publication Data , 2015, PloS one.

[2]  Richard Duszak,et al.  Alternative Metrics ("Altmetrics") for Assessing Article Impact in Popular General Radiology Journals. , 2017, Academic radiology.

[3]  Amit X. Garg,et al.  MEDLINE clinical queries are robust when searching in recent publishing years , 2013, J. Am. Medical Informatics Assoc..

[4]  et al.,et al.  An overview of the design and methods for retrieving high-quality studies for clinical care , 2005, BMC Medical Informatics Decis. Mak..

[5]  Adriane N. Irwin,et al.  Comparison of the time‐to‐indexing in PubMed between biomedical journals according to impact factor, discipline, and focus , 2017, Research in social & administrative pharmacy : RSAP.

[6]  J. Brooks Why most published research findings are false: Ioannidis JP, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece , 2008 .

[7]  Yindalon Aphinyanagphongs,et al.  Research Paper: Using Citation Data to Improve Retrieval from MEDLINE , 2006, J. Am. Medical Informatics Assoc..

[8]  Chee Peng Lim,et al.  A hybrid intelligent system for medical data classification , 2014, Expert Syst. Appl..

[9]  Arjen Hoogendam,et al.  Answers to Questions Posed During Daily Patient Care Are More Likely to Be Answered by UpToDate Than PubMed , 2008, Journal of medical Internet research.

[10]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  J. Margolis,et al.  Citation Indexing and Evaluation of Scientific Papers , 1967, Science.

[13]  N. Dickey,et al.  Systems analysis of adverse drug events. , 1996, JAMA.

[14]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[15]  A Simon Pickard,et al.  Comparative effectiveness research: Relevance and applications to pharmacy. , 2009, American journal of health-system pharmacy : AJHP : official journal of the American Society of Health-System Pharmacists.

[16]  D. Sackett,et al.  Evidence based medicine: what it is and what it isn't , 1996, BMJ.

[17]  J Petrásek [Journal selection for Index Medicus/Medline]. , 1996, Casopis lekaru ceskych.

[18]  J. Lewis,et al.  Statistical principles for clinical trials (ICH E9): an introductory note on an international guideline. , 1999, Statistics in medicine.

[19]  R. Brian Haynes,et al.  Developing optimal search strategies for detecting clinically sound studies in MEDLINE. , 1994, Journal of the American Medical Informatics Association : JAMIA.

[20]  Nancy M Albert,et al.  2014 AHA/ACC guideline for the management of patients with valvular heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. , 2014, The Journal of thoracic and cardiovascular surgery.

[21]  David W. Bates,et al.  Research Paper: KnowledgeLink: Impact of Context-Sensitive Information Retrieval on Clinicians' Information Needs , 2006, J. Am. Medical Informatics Assoc..

[22]  Charles R. Buncher,et al.  Statistics in the Pharmaceutical Industry , 1981 .

[24]  M. Drazner,et al.  2013 ACCF/AHA guideline for the management of heart failure: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Practice Guidelines. , 2013, Journal of the American College of Cardiology.

[25]  Harlan M Krumholz,et al.  2012 ACCF/AHA/ACP/AATS/PCNA/SCAI/STS Guideline for the Diagnosis and Management of Patients With Stable Ischemic Heart Disease: Executive Summary: A Report of the American College of Cardiology Foundation/American Heart Association Task Force on Practice Guidelines, and the American College of Physi , 2012, Journal of the American College of Cardiology.

[26]  Dina Demner-Fushman,et al.  12 years on – Is the NLM medical text indexer still useful and relevant? , 2017, Journal of Biomedical Semantics.

[27]  Gonzalo Durán Pacheco,et al.  Multiple Testing Problems in Pharmaceutical Statistics , 2009 .

[28]  Jerome A Osheroff,et al.  Research Paper: Answering Physicians' Clinical Questions: Obstacles and Potential Solutions , 2005, J. Am. Medical Informatics Assoc..

[29]  D. Bates,et al.  Systems analysis of adverse drug events. ADE Prevention Study Group. , 1995, JAMA.

[30]  P. Gorman,et al.  Clinical questions raised by clinicians at the point of care: a systematic review. , 2014, JAMA internal medicine.

[31]  J. Halperin,et al.  2011 ASA/ACCF/AHA/AANN/AANS/ACR/ASNR/CNS/SAIP/SCAI/SIR/SNIS/SVM/SVS guideline on the management of patients with extracranial carotid and vertebral artery disease: executive summary: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Practice Guideline , 2011, Journal of the American College of Cardiology.

[32]  Gary Friday,et al.  2011 ASA/ACCF/AHA/AANN/AANS/ACR/ASNR/CNS/SAIP/SCAI/SIR/SNIS/SVM/SVS guideline on the management of patients with extracranial carotid and vertebral artery disease. , 2011, Stroke.

[33]  Victor S. Sheng,et al.  Cost-Sensitive Learning , 2009, Encyclopedia of Data Warehousing and Mining.

[34]  Halil Kilicoglu,et al.  Viewpoint Paper: Towards Automatic Recognition of Scientifically Rigorous Clinical Research Evidence , 2009, J. Am. Medical Informatics Assoc..

[35]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[36]  J. Golbeck In real life , 2016, Science.

[37]  R Brian Haynes,et al.  McMaster Premium LiteratUre Service (PLUS) performed well for identifying new studies for updated Cochrane reviews. , 2012, Journal of clinical epidemiology.

[38]  B. Djulbegovic,et al.  Pharmaceutical industry sponsorship and research outcome and quality: systematic review , 2003, BMJ : British Medical Journal.

[39]  M. Ebell,et al.  Analysis of questions asked by family doctors regarding patient care , 1999, BMJ.

[40]  Jane A. Linderbaum,et al.  2013 ACCF/AHA guideline for the management of ST-elevation myocardial infarction: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Practice Guidelines. , 2013, Journal of the American College of Cardiology.

[41]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement , 2009, BMJ.

[42]  Yindalon Aphinyanagphongs,et al.  Research Paper: A Comparison of Citation Metrics to Machine Learning Filters for the Identification of High Quality MEDLINE Documents , 2006, J. Am. Medical Informatics Assoc..

[43]  Lutz Bornmann,et al.  What do citation counts measure? A review of studies on citing behavior , 2008, J. Documentation.

[44]  Yindalon Aphinyanagphongs,et al.  Research Paper: Text Categorization Models for High-Quality Article Retrieval in Internal Medicine , 2004, J. Am. Medical Informatics Assoc..

[45]  R. Pearson,et al.  Cetuximab and Chemotherapy as Initial Treatment for Metastatic Colorectal Cancer , 2010 .

[46]  Joel Lexchin,et al.  Industry sponsorship and research outcome. , 2017, The Cochrane database of systematic reviews.

[47]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[48]  Thoralf M Sundt,et al.  2014 AHA/ACC guideline for the management of patients with valvular heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. , 2014, Journal of the American College of Cardiology.

[49]  Pieter de Vries Robbé,et al.  Analysis of queries sent to PubMed at the point of care: Observation of search behaviour in a medical teaching hospital , 2008, BMC Medical Informatics Decis. Mak..

[50]  Marie E. McVeigh,et al.  The journal impact factor denominator: defining citable (counted) items. , 2009, JAMA.

[51]  David A. Cook,et al.  Features of Effective Medical Knowledge Resources to Support Point of Care Learning: A Focus Group Study , 2013, PloS one.

[52]  S. Barbic,et al.  An Analysis of Altmetrics in Emergency Medicine. , 2016, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[53]  Soni Jyoti,et al.  Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction , 2011 .

[54]  E. Garfield The history and meaning of the journal impact factor. , 2006, JAMA.

[55]  Sergio Sismondo,et al.  Pharmaceutical company funding and its consequences: a qualitative systematic review. , 2008, Contemporary clinical trials.

[56]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[57]  Jeffrey R Curtis,et al.  2012 Update of the 2008 American College of Rheumatology recommendations for the use of disease‐modifying antirheumatic drugs and biologic agents in the treatment of rheumatoid arthritis , 2012, Arthritis care & research.

[58]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[59]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[60]  M. Ezekowitz,et al.  2014 AHA/ACC/HRS guideline for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines and the Heart Rhythm Society. , 2014, Circulation.

[61]  Thoralf M Sundt,et al.  2014 AHA/ACC Guideline for the Management of Patients With Valvular Heart Disease: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. , 2014, Circulation.

[62]  G. Lamas,et al.  ACC/AHA guidelines for the management of patients with ST-elevation myocardial infarction--executive summary. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Writing Committee to revise the 1999 guidelines for the management of patients wi , 2004, Journal of the American College of Cardiology.

[63]  D. Covell,et al.  Information needs in office practice: are they being met? , 1985, Annals of internal medicine.

[64]  Victor S. Sheng,et al.  Cost-Sensitive Learning , 2009, Encyclopedia of Data Warehousing and Mining.

[65]  Committed selection: abridged index medicus. , 1970, The New England journal of medicine.