Discovering and visualizing indirect associations between biomedical concepts

Motivation: Discovering useful associations between biomedical concepts has been one of the main goals in biomedical text-mining, and understanding their biomedical contexts is crucial in the discovery process. Hence, we need a text-mining system that helps users explore various types of (possibly hidden) associations in an easy and comprehensible manner. Results: This article describes FACTA+, a real-time text-mining system for finding and visualizing indirect associations between biomedical concepts from MEDLINE abstracts. The system can be used as a text search engine like PubMed with additional features to help users discover and visualize indirect associations between important biomedical concepts such as genes, diseases and chemical compounds. FACTA+ inherits all functionality from its predecessor, FACTA, and extends it by incorporating three new features: (i) detecting biomolecular events in text using a machine learning model, (ii) discovering hidden associations using co-occurrence statistics between concepts, and (iii) visualizing associations to improve the interpretability of the output. To the best of our knowledge, FACTA+ is the first real-time web application that offers the functionality of finding concepts involving biomolecular events and visualizing indirect associations of concepts with both their categories and importance. Availability: FACTA+ is available as a web application at http://refine1-nactem.mc.man.ac.uk/facta/, and its visualizer is available at http://refine1-nactem.mc.man.ac.uk/facta-visualizer/. Contact: tsuruoka@jaist.ac.jp

[1]  Yukiko Matsuoka,et al.  PathText: a text mining integrator for biological pathway visualizations , 2010, Bioinform..

[2]  Jun'ichi Tsujii,et al.  A Markov Logic Approach to Bio-Molecular Event Extraction , 2009, BioNLP@HLT-NAACL.

[3]  Martin Krallinger Importance of negations and experimental qualifiers in biomedical literature , 2010, NeSp-NLP@ACL.

[4]  Alfonso Valencia,et al.  The Frame-Based Module of the SUISEKI Information Extraction System , 2002, IEEE Intell. Syst..

[5]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[6]  Marc Weeber,et al.  Case Report: Generating Hypotheses by Discovering Implicit Associations in the Literature: A Case Report of a Search for New Potential Therapeutic Uses for Thalidomide , 2003, J. Am. Medical Informatics Assoc..

[7]  Roser Morante,et al.  Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, NeSp-NLP@ACL 2010, Uppsala, Sweden, July 10, 2010 , 2010, NeSp-NLP@ACL.

[8]  Halil Kilicoglu,et al.  Semantic MEDLINE: A web application for managing the results of PubMed searches , 2008, SMBM 2008.

[9]  Jari Björne,et al.  Extracting Complex Biological Events with Rich Graph-Based Feature Sets , 2009, BioNLP@HLT-NAACL.

[10]  K. Bretonnel Cohen,et al.  High-precision biological event extraction with a concept recognizer , 2009, BioNLP@HLT-NAACL.

[11]  Ted Briscoe,et al.  Biomedical Event Extraction without Training Data , 2009, BioNLP@HLT-NAACL.

[12]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[13]  Jacob de Vlieg,et al.  Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases , 2010, PLoS Comput. Biol..

[14]  Yael Garten,et al.  Recent progress in automatically extracting information from the pharmacogenomic literature. , 2010, Pharmacogenomics.

[15]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[16]  Sophia Ananiadou,et al.  BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing , 2012 .

[17]  Neil R. Smalheiser,et al.  Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in MEDLINE , 2009, Comput. Methods Programs Biomed..

[18]  Jun'ichi Tsujii,et al.  Evaluating contributions of natural language parsers to protein–protein interaction extraction , 2008, Bioinform..

[19]  Maurice Bouwhuis,et al.  CoPub: a literature-based keyword enrichment tool for microarray data analysis , 2008, Nucleic Acids Res..

[20]  Jonathan D. Wren,et al.  Knowledge discovery by automated identification and ranking of implicit relationships , 2004, Bioinform..

[21]  Roser Morante,et al.  A memory-based learning approach to event extraction in biomedical texts , 2009, BioNLP@HLT-NAACL.

[22]  Mariana L. Neves,et al.  Extraction of biomedical events using case-based reasoning , 2009, BioNLP@HLT-NAACL.

[23]  D. Swanson Medical literature as a potential source of new knowledge. , 1990, Bulletin of the Medical Library Association.

[24]  Terri K. Attwood,et al.  BioIE: extracting informative sentences from the biomedical literature , 2005, Bioinform..

[25]  Naoaki Okazaki,et al.  Kleio: a knowledge-enriched information retrieval system for biology , 2008, SIGIR '08.

[26]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[27]  Timothy Baldwin,et al.  Biomedical Event Annotation with CRFs and Precision Grammars , 2009, BioNLP@HLT-NAACL.

[28]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[29]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[30]  Lawrence Hunter,et al.  Biomedical Discovery Acceleration, with Applications to Craniofacial Development , 2009, PLoS Comput. Biol..

[31]  Michael Schroeder,et al.  GoWeb: a semantic search engine for the life science web , 2009, BMC Bioinformatics.

[32]  Jun'ichi Tsujii,et al.  Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition , 2006, ACL.

[33]  Jun'ichi Tsujii,et al.  Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task, BioNLP@HLT-NAACL 2009 - Shared Task, Boulder, Colorado, USA, June 5, 2009 , 2009, BioNLP@HLT-NAACL.

[34]  Andreas Vlachos,et al.  Two Strong Baselines for the BioNLP 2009 Event Extraction Task , 2010, BioNLP@ACL.

[35]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[36]  Udo Hahn,et al.  Event Extraction from Trimmed Dependency Graphs , 2009, BioNLP@HLT-NAACL.

[37]  Sophia Ananiadou,et al.  FACTA: a text search engine for finding associated biomedical concepts , 2008, Bioinform..

[38]  Hoifung Poon,et al.  Joint Inference for Knowledge Extraction from Biomedical Literature , 2010, NAACL.

[39]  Jari Björne,et al.  Complex event extraction at PubMed scale , 2010, Bioinform..

[40]  Barend Mons,et al.  Online tools to support literature-based discovery in the life sciences , 2005, Briefings Bioinform..

[41]  Erik F. Tjong Kim Sang,et al.  Representing Text Chunks , 1999, EACL.

[42]  Ulf Leser,et al.  Molecular event extraction from Link Grammar parse trees , 2009, BioNLP@HLT-NAACL.

[43]  Jun'ichi Tsujii,et al.  Event Extraction with Complex Event Classification Using Rich Features , 2010, J. Bioinform. Comput. Biol..

[44]  Joyce A. Mitchell,et al.  Using literature-based discovery to identify disease candidate genes , 2005, Int. J. Medical Informatics.

[45]  Halil Kilicoglu,et al.  Syntactic Dependency Based Heuristics for Biological Event Extraction , 2009, BioNLP@HLT-NAACL.

[46]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[47]  M. Schuemie,et al.  Anni 2.0: a multipurpose text-mining tool for the life sciences , 2008, Genome Biology.

[48]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[49]  Fabio Rinaldi,et al.  UZurich in the BioNLP 2009 Shared Task , 2009, BioNLP@HLT-NAACL.

[50]  Jun'ichi Tsujii,et al.  Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases , 2006, ACL.

[51]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[52]  Ben Shneiderman,et al.  Treemaps for space-constrained visualization of hierarchies , 2005 .

[53]  Sophia Ananiadou,et al.  Evaluating a meta-knowledge annotation scheme for bio-events , 2010, NeSp-NLP@ACL.

[54]  Wanda Pratt,et al.  A new evaluation methodology for literature-based discovery systems , 2009, J. Biomed. Informatics.

[55]  Yvan Saeys,et al.  Analyzing text in search of bio-molecular events: a high-precision machine learning framework , 2009, BioNLP@HLT-NAACL.

[56]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[57]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[58]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from full texts , 2004, Bioinform..