Predictive article recommendation using natural language processing and machine learning to support evidence updates in domain-specific knowledge graphs

Abstract Objectives Describe an augmented intelligence approach to facilitate the update of evidence for associations in knowledge graphs. Methods New publications are filtered through multiple machine learning study classifiers, and filtered publications are combined with articles already included as evidence in the knowledge graph. The corpus is then subjected to named entity recognition, semantic dictionary mapping, term vector space modeling, pairwise similarity, and focal entity match to identify highly related publications. Subject matter experts review recommended articles to assess inclusion in the knowledge graph; discrepancies are resolved by consensus. Results Study classifiers achieved F-scores from 0.88 to 0.94, and similarity thresholds for each study type were determined by experimentation. Our approach reduces human literature review load by 99%, and over the past 12 months, 41% of recommendations were accepted to update the knowledge graph. Conclusion Integrated search and recommendation exploiting current evidence in a knowledge graph is useful for reducing human cognition load.

[1]  Yan Wang,et al.  Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes , 2019, JCO clinical cancer informatics.

[2]  Jia Xu,et al.  Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives , 2019, Human Genetics.

[3]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[4]  Ajay K. Royyuru,et al.  Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma , 2017, Neurology: Genetics.

[5]  Laxmi Parida,et al.  Watson for Genomics: Moving Personalized Medicine Forward. , 2016, Trends in cancer.

[6]  Subha Madhavan,et al.  Art and Challenges of Precision Medicine: Interpreting and Integrating Genomic Data Into Clinical Practice. , 2018, American Society of Clinical Oncology educational book. American Society of Clinical Oncology. Annual Meeting.

[7]  Ajay K. Royyuru,et al.  Sequencing and curation strategies for identifying candidate glioblastoma treatments , 2019, BMC Medical Genomics.

[8]  Peng Yuan,et al.  Safety, Efficacy, and Biomarker Analysis of Pyrotinib in Combination with Capecitabine in HER2-Positive Metastatic Breast Cancer Patients: A Phase I Clinical Trial , 2019, Clinical Cancer Research.

[9]  Andrew W. Brown,et al.  Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry , 2017, BMJ Open.

[10]  Tao Wang,et al.  Effects of icotinib on early-stage non-small-cell lung cancer as neoadjuvant treatment with different epidermal growth factor receptor phenotypes , 2016, OncoTargets and therapy.

[11]  Jian Su,et al.  Crizotinib in advanced non-small-cell lung cancer with concomitant ALK rearrangement and c-Met overexpression , 2018, BMC Cancer.

[12]  Alan Ritter,et al.  Using ontology-based semantic similarity to facilitate the article screening process for systematic reviews , 2017, J. Biomed. Informatics.

[13]  Olivier Bodenreider,et al.  Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature , 2011, Bioinform..

[14]  Ronglai Shen,et al.  A Phase Ib Open-Label Multicenter Study of AZD4547 in Patients with Advanced Squamous Cell Lung Cancers , 2017, Clinical Cancer Research.

[15]  Christal D. Sohl,et al.  The FGFR1 V561M Gatekeeper Mutation Drives AZD4547 Resistance through STAT3 Activation and EMT , 2018, Molecular Cancer Research.

[16]  K. Bretonnel Cohen,et al.  Text mining for the biocuration workflow , 2012, Database J. Biol. Databases Curation.

[17]  Laxmi Parida,et al.  Evaluating Clinical Genome Sequence Analysis by Watson for Genomics , 2018, Front. Med..

[18]  Yan Li,et al.  Effectiveness of EGFR-TKIs in a Patient with Lung Adenocarcinoma Harboring an EGFR-RAD51 Fusion. , 2019, The oncologist.

[19]  Peng Yuan,et al.  Phase I Study and Biomarker Analysis of Pyrotinib, a Novel Irreversible Pan-ErbB Receptor Tyrosine Kinase Inhibitor, in Patients With Human Epidermal Growth Factor Receptor 2-Positive Metastatic Breast Cancer. , 2017, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[20]  Won-Ho Shin,et al.  Deep learning of mutation-gene-drug relations from the literature , 2017, BMC Bioinformatics.

[21]  K. Bretonnel Cohen,et al.  MutationFinder: a high-performance system for extracting point mutation mentions from text , 2007, Bioinform..

[22]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[23]  Siddhartha Jonnalagadda,et al.  A new iterative method to reduce workload in systematic review process , 2013, Int. J. Comput. Biol. Drug Des..

[24]  Irene Dankwa-Mullan,et al.  Clinical insights for hematological malignancies from an artificial intelligence decision-support tool. , 2019, Journal of Clinical Oncology.

[25]  Regina Barzilay,et al.  Validation of a Semiautomated Natural Language Processing-Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance. , 2019, JCO clinical cancer informatics.

[26]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[27]  Wei Li,et al.  Pyrotinib or Lapatinib Combined With Capecitabine in HER2-Positive Metastatic Breast Cancer With Prior Taxanes, Anthracyclines, and/or Trastuzumab: A Randomized, Phase II Study. , 2019, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[28]  Nikhil Wagle,et al.  The fuzzy world of precision medicine: deliberations of a precision medicine tumor board , 2016, Personalized medicine.

[29]  Stan Matwin,et al.  A new algorithm for reducing the workload of experts in performing systematic reviews , 2010, J. Am. Medical Informatics Assoc..

[30]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[31]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[32]  Laxmi Parida,et al.  Enhancing Next‐Generation Sequencing‐Guided Cancer Care Through Cognitive Computing , 2017, The oncologist.