DL4papers: a deep learning approach for the automatic interpretation of scientific articles

MOTIVATION In precision medicine, next-generation sequencing and novel preclinical reports have led to an increasingly large amount of results, published in the scientific literature. However, identifying novel treatments or predicting a drug response in, for example, cancer patients, from the huge amount of papers available remains a laborious and challenging work. This task can be considered a text mining problem that requires reading a lot of academic documents for identifying a small set of papers describing specific relations between key terms. Due to the infeasibility of the manual curation of these relations, computational methods that can automatically identify them from the available literature are urgently needed. RESULTS We present DL4papers, a new method based on deep learning that is capable of analyzing and interpreting papers in order to automatically extract relevant relations between specific keywords. DL4papers receives as input a query with the desired keywords, and it returns a ranked list of papers that contain meaningful associations between the keywords. The comparison against related methods showed that our proposal outperformed them in a cancer corpus. The reliability of the DL4papers output list was also measured, revealing that 100% of the first two documents retrieved for a particular search have relevant relations, in average. This shows that our model can guarantee that in the top-2 papers of the ranked list, the relation can be effectively found. Furthermore, the model is capable of highlighting, within each document, the specific fragments that have the associations of the input keywords. This can be very useful in order to pay attention only to the highlighted text, instead of reading the full paper. We believe that our proposal could be used as an accurate tool for rapidly identifying relationships between genes and their mutations, drug responses and treatments in the context of a certain disease. This new approach can certainly be a very useful and valuable resource for the advancement of the precision medicine field. AVAILABILITY AND IMPLEMENTATION A web-demo is available at: http://sinc.unl.edu.ar/web-demo/dl4papers/. Full source code and data are available at: https://sourceforge.net/projects/sourcesinc/files/dl4papers/.

[1]  Yijia Zhang,et al.  An attention-based effective neural model for drug-drug interactions extraction , 2017, BMC Bioinformatics.

[2]  Alfonso Valencia,et al.  Precision medicine needs pioneering clinical bioinformaticians , 2019, Briefings Bioinform..

[3]  Yonghwa Choi,et al.  HiPub: translating PubMed and PMC texts to networks for knowledge discovery , 2016, Bioinform..

[4]  Zhiyong Lu,et al.  Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature , 2016, J. Am. Medical Informatics Assoc..

[5]  Olivier Elemento,et al.  A primer on precision medicine informatics , 2016, Briefings Bioinform..

[6]  Jaehoon Choi,et al.  BEST: Next-Generation Biomedical Entity Search Tool for Knowledge Discovery from Biomedical Literature , 2016, PloS one.

[7]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[8]  Fei Li,et al.  A neural joint model for entity and relation extraction from biomedical text , 2017, BMC Bioinformatics.

[9]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[10]  P. Laurent-Puig,et al.  Molecular targeted therapy of BRAF-mutant colorectal cancer , 2019, Therapeutic advances in medical oncology.

[11]  Sophia Ananiadou,et al.  Thalia: semantic search engine for biomedical abstracts , 2018, Bioinform..

[12]  Khaled S. Ahmed,et al.  Estimating Protein Functions Correlation Based on Overlapping Proteins and Cluster Interactions , 2012 .

[13]  Georgina Stegmayer,et al.  Deep Neural Architectures for Highly Imbalanced Data in Bioinformatics , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Meng Liu,et al.  Multimodal network diffusion predicts future disease-gene-chemical associations , 2018, Bioinform..

[15]  Yifan Peng,et al.  Extracting chemical–protein relations with ensembles of SVM and deep learning models , 2018, Database J. Biol. Databases Curation.

[16]  Zhiyong Lu,et al.  GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains , 2015, BioMed research international.

[17]  Won-Ho Shin,et al.  Deep learning of mutation-gene-drug relations from the literature , 2017, BMC Bioinformatics.

[18]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[19]  J. Rodriguez,et al.  PanDrugs: a novel method to prioritize anticancer drug treatments according to individual genomic data , 2018, Genome Medicine.

[20]  L Horn,et al.  My Cancer Genome: Web-based clinical decision support for genome-directed lung cancer treatment. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[21]  T. Vanden Berghe,et al.  Paving the way for precision medicine v2.0 in intensive care by profiling necroinflammation in biofluids , 2018, Cell Death & Differentiation.

[22]  M. Levy,et al.  Integrating cancer genomic data into electronic health records , 2016, Genome Medicine.

[23]  Yu Zhang,et al.  Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning , 2018, bioRxiv.

[24]  Maryam Habibi,et al.  Deep learning with word embeddings improves biomedical named entity recognition , 2017, Bioinform..

[25]  Dongmei Li,et al.  Bon-EV: an improved multiple testing procedure for controlling false discovery rates , 2017, BMC Bioinformatics.

[26]  Milton Pividori,et al.  Predicting novel microRNA: a comprehensive comparison of machine learning approaches , 2019, Briefings Bioinform..

[27]  Zhiyong Lu,et al.  tmChem: a high performance approach for chemical named entity recognition and normalization , 2015, Journal of Cheminformatics.

[28]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[29]  Jaewoo Kang,et al.  BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations , 2016, Database J. Biol. Databases Curation.

[30]  Zhiyong Lu,et al.  tmVar: a text mining approach for extracting sequence variants in biomedical literature , 2013, Bioinform..

[31]  Ulf Leser,et al.  ChemSpot: a hybrid system for chemical named entity recognition , 2012, Bioinform..

[32]  Olivier Bodenreider,et al.  Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature , 2011, Bioinform..