FLUTE: Fast and reliable knowledge retrieval from biomedical literature

Abstract State-of-the-art machine reading methods extract, in hours, hundreds of thousands of events from the biomedical literature. However, many of the extracted biomolecular interactions are incorrect or not relevant for computational modeling of a system of interest. Therefore, rapid, automated methods are required to filter and select accurate and useful information. The FiLter for Understanding True Events (FLUTE) tool uses public protein interaction databases to filter interactions that have been extracted by machines from databases such as PubMed and score them for accuracy. Confidence in the interactions allows for rapid and accurate model assembly. As our results show, FLUTE can reliably determine the confidence in the biomolecular interactions extracted by fast machine readers and at the same time provide a speedup in interaction filtering by three orders of magnitude. Database URL: https://bitbucket.org/biodesignlab/flute.

[1]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[2]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[3]  Jari Björne,et al.  Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization , 2013, PloS one.

[4]  Anders Karlsson,et al.  Hyaluronic Acid Levels Predict Risk of Hepatic Encephalopathy and Liver-Related Death in HIV/Viral Hepatitis Coinfected Patients , 2013, PloS one.

[5]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[6]  Bin Zhang,et al.  PhosphoSitePlus, 2014: mutations, PTMs and recalibrations , 2014, Nucleic Acids Res..

[7]  Sophia Ananiadou,et al.  Using uncertainty to link and rank evidence from biomedical literature for model curation , 2017, Bioinform..

[8]  Arika E. Wieneke,et al.  Validation of natural language processing to extract breast cancer pathology procedures and results , 2015, Journal of pathology informatics.

[9]  R. Sprengel,et al.  Different Forms of AMPA Receptor Mediated LTP and Their Correlation to the Spatial Working Memory Formation , 2017, Front. Mol. Neurosci..

[10]  Benjamin M. Gyori,et al.  From word models to executable models of signaling networks using automated assembly , 2017, bioRxiv.

[11]  黄亚明 PubMed Central , 2009 .

[12]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[13]  Gaoxiang Zhou,et al.  Dynamic system explanation: DySE, a framework that evolves to reason about complex systems - lessons learned , 2019, AIDR.

[14]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[15]  Mihai Surdeanu,et al.  A Domain-independent Rule-based Framework for Event Extraction , 2015, ACL.

[16]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[17]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[18]  Sergei Egorov,et al.  MedScan, a natural language processing engine for MEDLINE abstracts , 2003, Bioinform..

[19]  Jari Björne,et al.  Biomedical Event Extraction Using Convolutional Neural Networks and Dependency Parsing , 2018, BioNLP.

[20]  Damian Szklarczyk,et al.  STITCH 4: integration of protein–chemical interactions with user data , 2013, Nucleic Acids Res..

[21]  Natasa Miskov-Zivanov,et al.  ACCORDION: Clustering and Selecting Relevant Data for Guided Network Extension and Query Answering. , 2020 .

[22]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[23]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[24]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[25]  Qinsi Wang,et al.  Methods to Expand Cell Signaling Models Using Automated Reading and Model Checking , 2017, CMSB.

[26]  Mihai Surdeanu,et al.  Description of the Odin Event Extraction Framework and Rule Language , 2015, ArXiv.

[27]  Kaisu Lin,et al.  Identification of genomic expression differences between right-sided and left-sided colon cancer based on bioinformatics analysis , 2018, OncoTargets and therapy.

[28]  Jin-Woo Chung,et al.  Proceedings of BioNLP 15 , 2015 .

[29]  Xu Lin,et al.  Autophagy-mediated HMGB1 release promotes gastric cancer cell survival via RAGE activation of extracellular signal-regulated kinases 1/2 , 2015, Oncology reports.

[30]  Choh Man Teng,et al.  Complex Event Extraction using DRUM , 2015, BioNLP@IJCNLP.

[31]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[32]  James R Faeder,et al.  Cutting Edge: Differential Regulation of PTEN by TCR, Akt, and FoxO1 Controls CD4+ T Cell Fate Decisions , 2015, The Journal of Immunology.

[33]  Jari Björne,et al.  Complex event extraction at PubMed scale , 2010, Bioinform..

[34]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[35]  Kara Dolinski,et al.  The BioGRID interaction database: 2019 update , 2018, Nucleic Acids Res..

[36]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[37]  Jing Chen,et al.  NDEx, the Network Data Exchange. , 2015, Cell systems.

[38]  Khaled Sayed,et al.  Automated Extension of Cell Signaling Models with Genetic Algorithm , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[39]  M. Langeslag,et al.  Signatures of Altered Gene Expression in Dorsal Root Ganglia of a Fabry Disease Mouse Model , 2018, Front. Mol. Neurosci..

[40]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..