Application of Public Knowledge Discovery Tool (PKDE4J) to Represent Biomedical Scientific Knowledge

In today’s era of information explosion, extracting entities and their relations in large-scale, unstructured collections of text to better represent knowledge has emerged as a daunting challenge in biomedical text mining. To respond to the demand to automatically extract scientific knowledge with higher precision, the public knowledge discovery tool PKDE4J (Song et al., 2015) was proposed as a flexible text-mining tool. In this study, we propose an extended version of PKDE4J to represent scientific knowledge for literature-based knowledge discovery. Specifically, we assess the performance of PKDE4J in terms of three extraction tasks: entity, relation, and event detection. We also suggest applications of PKDE4J along three lines: 1) knowledge search, 2) knowledge linking, and 3) knowledge inference. We first describe the updated features of PKDE4J and report on tests of its performance. With additional options in the processes of named entity extraction, verb expansion, and event detection, we expect that the enhanced PKDE4J can be utilized for literature-based knowledge discovery.

[1]  Jim Webber,et al.  A programmatic introduction to Neo4j , 2018, SPLASH '12.

[2]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[3]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..

[4]  Patchigolla V. S. S. Rahul,et al.  Biomedical Event Trigger Identification Using Bidirectional Recurrent Neural Network Based Models , 2017, BioNLP.

[5]  Martin Hofmann-Apitius,et al.  Detection of IUPAC and IUPAC-like chemical names , 2008, ISMB.

[6]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[7]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[8]  Erik M. van Mulligen,et al.  Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes , 2005, Bioinform..

[9]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[10]  Ulf Leser,et al.  Learning Protein–Protein Interaction Extraction using Distant Supervision , 2011 .

[11]  Keun Ho Ryu,et al.  Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations , 2015, Journal of Cheminformatics.

[12]  Xiaolong Wang,et al.  A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature , 2015, Journal of Cheminformatics.

[13]  Jun S. Liu,et al.  Integrated Bio-Entity Network: A System for Biological Knowledge Discovery , 2011, PloS one.

[14]  Jari Björne,et al.  Complex event extraction at PubMed scale , 2010, Bioinform..

[15]  Yonghwan Kim,et al.  Grounded Feature Selection for Biomedical Relation Extraction by the Combinative Approach , 2014, DTMBIO '14.

[16]  Halil Kilicoglu,et al.  Semantic MEDLINE: An advanced information management application for biomedicine , 2011, Inf. Serv. Use.

[17]  Núria Queralt-Rosinach,et al.  Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014, BMC Bioinformatics.

[18]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[19]  Andreas Holzinger,et al.  Machine Learning for Health Informatics , 2016, Lecture Notes in Computer Science.

[20]  Iryna Gurevych,et al.  Using Wiktionary for Computing Semantic Relatedness , 2008, AAAI.

[21]  K. Bretonnel Cohen,et al.  MutationFinder: a high-performance system for extracting point mutation mentions from text , 2007, Bioinform..

[22]  Martin Hofmann-Apitius,et al.  Chemical Names: Terminological Resources and Corpora Annotation , 2008, LREC 2008.

[23]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[24]  Hongfei Lin,et al.  Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature , 2008, Comput. Biol. Chem..

[25]  Christian Biemann,et al.  An adaptive annotation approach for biomedical entity and relation recognition , 2016, Brain Informatics.

[26]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[27]  Jian Huang,et al.  A robust two-way semi-linear model for normalization of cDNA microarray data , 2005, BMC Bioinformatics.

[28]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[29]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[30]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[31]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[32]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[33]  S. Baek,et al.  Enriching plausible new hypothesis generation in PubMed , 2017, PloS one.

[34]  Kalpana Raja,et al.  PPInterFinder—a mining tool for extracting causal relations on human proteins from literature , 2013, Database J. Biol. Databases Curation.

[35]  Eneko Agirre,et al.  Single or Multiple? Combining Word Representations Independently Learned from Text and WordNet , 2016, AAAI.

[36]  Min Song,et al.  PKDE4J: Entity and relation extraction for public knowledge discovery , 2015, J. Biomed. Informatics.

[37]  Jiang Qian,et al.  TiGER: A database for tissue-specific gene expression and regulation , 2008, BMC Bioinformatics.

[38]  Zhiyong Lu,et al.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models , 2016, Bioinform..

[39]  Peter M. A. Sloot,et al.  A hybrid approach to extract protein-protein interactions , 2011, Bioinform..

[40]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[41]  Nigel Collier,et al.  PASBio: predicate-argument structures for event extraction in molecular biology , 2004, BMC Bioinformatics.

[42]  K. Becker,et al.  The Genetic Association Database , 2004, Nature Genetics.

[43]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[44]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[45]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[46]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[47]  Burr Settles ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[48]  Jacques Ravel,et al.  Visualization of comparative genomic analyses by BLAST score ratio , 2005, BMC Bioinformatics.

[49]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[50]  Alfonso Valencia,et al.  CheNER: chemical named entity recognizer , 2014, Bioinform..

[51]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[52]  Dietrich Rebholz-Schuhmann,et al.  Assessment of disease named entity recognition on a corpus of annotated sentences , 2008, BMC Bioinformatics.

[53]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[54]  Yifan Peng,et al.  miRTex: A Text Mining System for miRNA-Gene Relation Extraction , 2015, PLoS Comput. Biol..

[55]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[56]  Deyu Zhou,et al.  Biomedical events extraction using the hidden vector state model , 2011, Artif. Intell. Medicine.

[57]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[58]  Akinori Yonezawa,et al.  The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011 , 2012, BMC Bioinformatics.

[59]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[60]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2010, Nucleic Acids Res..

[61]  Sophia Ananiadou,et al.  Boosting automatic event extraction from the literature using domain adaptation and coreference resolution , 2012, Bioinform..

[62]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.