Text-mining solutions for biomedical research: enabling integrative biology

In response to the unbridled growth of information in literature and biomedical databases, researchers require efficient means of handling and extracting information. As well as providing background information for research, scientific publications can be processed to transform textual information into database content or complex networks and can be integrated with existing knowledge resources to suggest novel hypotheses. Information extraction and text data analysis can be particularly relevant and helpful in genetics and biomedical research, in which up-to-date information about complex processes involving genes, proteins and phenotypes is crucial. Here we explore the latest advancements in automated literature analysis and its contribution to innovative research approaches.

[1]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[2]  D. Swanson Medical literature as a potential source of new knowledge. , 1990, Bulletin of the Medical Library Association.

[3]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[4]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[5]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  Stefan Decker,et al.  Creating Semantic Web Contents with Protégé-2000 , 2001, IEEE Intell. Syst..

[8]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[9]  A. Valencia,et al.  Mining functional information associated with expression arrays , 2001, Functional & Integrative Genomics.

[10]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[11]  Mikhail V. Blagosklonny,et al.  Conceptual biology: Unearthing the gems , 2002, Nature.

[12]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[13]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[14]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[15]  Lars Juhl Jensen,et al.  Large-scale extraction of gene regulation for model organisms in an ontological context , 2004, Silico Biol..

[16]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[17]  Hector J. Levesque,et al.  Knowledge Representation and Reasoning , 2004 .

[18]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[19]  Hector J. Levesque,et al.  Chapter 14 – Actions , 2004 .

[20]  Mark I McCarthy,et al.  Progress in defining the molecular basis of type 2 diabetes mellitus through susceptibility-gene identification. , 2004, Human molecular genetics.

[21]  P. Bork,et al.  G2D: a tool for mining genes associated with disease , 2005, BMC Genetics.

[22]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[23]  K. E. Ravikumar,et al.  Beyond the clause: extraction of phosphorylation information from medline abstracts , 2005, ISMB.

[24]  Ralf Zimmer,et al.  Expert knowledge without the expert: integrated analysis of gene expression and literature to derive active functional contexts , 2005, ECCB/JBI.

[25]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[26]  Shawn M. Douglas,et al.  PubNet: a flexible system for visualizing literature derived networks , 2005, Genome Biology.

[27]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[28]  Andrey Rzhetsky,et al.  Microparadigms: chains of collective reasoning in publications about molecular interactions. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[29]  See-Kiong Ng,et al.  BioContrasts: extracting and exploiting protein-protein contrastive relations from biomedical literature , 2005, Bioinform..

[30]  Andy Seaborne,et al.  SWAN: A distributed knowledge infrastructure for Alzheimer disease research , 2006, J. Web Semant..

[31]  K. E. Ravikumar,et al.  An online literature mining tool for protein phosphorylation , 2006, Bioinform..

[32]  Zhiyong Lu,et al.  OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression , 2008, BMC Bioinformatics.

[33]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[34]  Ted Briscoe,et al.  Natural Language Processing in aid of FlyBase curators , 2008, BMC Bioinformatics.

[35]  Eric K. Neumann,et al.  Knowledge networks in the age of the Semantic Web , 2007, Briefings Bioinform..

[36]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[37]  Midori A. Harris,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm112 Databases and ontologies OBO-Edit—an ontology editor for biologists , 2007 .

[38]  Julio Collado-Vides,et al.  Automatic reconstruction of a bacterial regulatory network using Natural Language Processing , 2007, BMC Bioinformatics.

[39]  David S. Wishart,et al.  Nucleic Acids Research Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs and Metabolites , 2008 .

[40]  Hagit Shatkay,et al.  Pacific Symposium on Biocomputing 13:604-615(2008) EPILOC: A (WORKING) TEXT-BASED SYSTEM FOR PREDICTING PROTEIN SUBCELLULAR LOCATION , 2022 .

[41]  Michael Schroeder,et al.  Inter-species normalization of gene mentions with GNAT , 2008, ECCB.

[42]  Son Doan,et al.  BioCaster: detecting public health rumors with a Web-based text mining system , 2008, Bioinform..

[43]  Maurice Bouwhuis,et al.  CoPub: a literature-based keyword enrichment tool for microarray data analysis , 2008, Nucleic Acids Res..

[44]  Dietrich Rebholz-Schuhmann,et al.  Assessment of disease named entity recognition on a corpus of annotated sentences , 2008, BMC Bioinformatics.

[45]  D. Vitkup,et al.  Network properties of genes harboring inherited disease mutations , 2008, Proceedings of the National Academy of Sciences.

[46]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[47]  Michael R. Seringhaus,et al.  Seeking a New Biology through Text Mining , 2008, Cell.

[48]  Sophia Ananiadou,et al.  FACTA: a text search engine for finding associated biomedical concepts , 2008, Bioinform..

[49]  Dietrich Rebholz-Schuhmann,et al.  Categorization of services for seeking information in biomedical literature: a typology for improvement of practice , 2008, Briefings Bioinform..

[50]  A. Valencia,et al.  Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge , 2008, Genome Biology.

[51]  Boris Motik,et al.  OWL 2: The next step for OWL , 2008, J. Web Semant..

[52]  Dietrich Rebholz-Schuhmann,et al.  MedEvi: Retrieving textual evidence of relations between biomedical concepts from Medline , 2008, Bioinform..

[53]  L. Grivell,et al.  Text mining for biology - the way forward: opinions from leading scientists , 2008, Genome Biology.

[54]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[55]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[56]  K. Bretonnel Cohen,et al.  Getting Started in Text Mining , 2008, PLoS Comput. Biol..

[57]  Jun'ichi Tsujii,et al.  New challenges for text mining: mapping between text and manually curated pathways , 2008, BMC Bioinformatics.

[58]  P. Bork,et al.  Drug Target Identification Using Side-Effect Similarity , 2008, Science.

[59]  Dietrich Rebholz-Schuhmann,et al.  Integrating protein-protein interactions and text mining for protein function prediction , 2008, BMC Bioinformatics.

[60]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[61]  Ken E. Whelan,et al.  The Automation of Science , 2009, Science.

[62]  Udo Hahn,et al.  High-performance gene name normalization with GENO , 2009, Bioinform..

[63]  S. O’Rahilly,et al.  Human genetics illuminates the paths to metabolic disease , 2009, Nature.

[64]  Daniel L. Rubin,et al.  Comparison of concept recognizers for building the Open Biomedical Annotator , 2009, BMC Bioinformatics.

[65]  Lawrence Hunter,et al.  Biomedical Discovery Acceleration, with Applications to Craniofacial Development , 2009, PLoS Comput. Biol..

[66]  Peter L. Elkin,et al.  BioProspecting: novel marker discovery obtained by mining the bibleome , 2009, BMC Bioinformatics.

[67]  Michael Kuhn,et al.  Reflect: augmented browsing for the life scientist , 2009, Nature Biotechnology.

[68]  Dietrich Rebholz-Schuhmann,et al.  Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb , 2009, BMC Bioinformatics.

[69]  Judith A. Blake,et al.  Integrating text mining into the MGI biocuration workflow , 2009, Database J. Biol. Databases Curation.

[70]  Martijn J. Schuemie,et al.  Novel Protein-Protein Interactions Inferred from Literature Context , 2009, PloS one.

[71]  Taehoon Kim,et al.  Enabling multi-level relevance feedback on pubmed by integrating rank learning into DBMS , 2009, DTMBIO.

[72]  Monte Westerfield,et al.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation , 2009, PLoS biology.

[73]  Natalya F. Noy,et al.  BioPortal: Ontologies and Integrated Data Resources at the Click of a Mouse , 2009 .

[74]  Mark A. Musen,et al.  The Open Biomedical Annotator , 2009, Summit on translational bioinformatics.

[75]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[76]  Russ B. Altman,et al.  Pharmacogenomics and bioinformatics: PharmGKB. , 2010, Pharmacogenomics.

[77]  Peer Bork,et al.  Ontologies in Quantitative Biology: A Basis for Comparison, Integration, and Discovery , 2010, PLoS biology.

[78]  K. Bretonnel Cohen,et al.  The structural and content aspects of abstracts versus bodies of full text journal articles are different , 2010, BMC Bioinformatics.

[79]  David M Nathan,et al.  Individualizing therapies in type 2 diabetes mellitus based on patient characteristics: what we know and what we need to know. , 2010, The Journal of clinical endocrinology and metabolism.

[80]  Russ B. Altman,et al.  Author ' s personal copy Using text to build semantic networks for pharmacogenomics , 2010 .

[81]  Dietrich Rebholz-Schuhmann,et al.  Improving the extraction of complex regulatory events from scientific text by using ontology-based inference , 2011, Semantic Mining in Biomedicine.

[82]  Lynette Hirschman,et al.  The FEBS Letters/BioCreative II.5 experiment: making biological information accessible , 2010, Nature Biotechnology.

[83]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2009 update , 2009, Nucleic Acids Res..

[84]  P. Bork,et al.  A side effect resource to capture phenotypic effects of drugs , 2010, Molecular systems biology.

[85]  Steve Pettifer,et al.  Utopia documents: linking scholarly literature with research data , 2010, Bioinform..

[86]  Holger Stenzhorn,et al.  Establishing a distributed system for the simple representation and integration of diverse scientific assertions , 2010, J. Biomed. Semant..

[87]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[88]  Dietrich Rebholz-Schuhmann,et al.  UKPMC: a full text article resource for the life sciences , 2011, Nucleic Acids Res..

[89]  R. Luben,et al.  Genetic predisposition to obesity leads to increased risk of type 2 diabetes , 2011, Diabetologia.

[90]  D. Rebholz-Schuhmann,et al.  Diversity in the Interactions of Isoforms Linked to Clustered Transcripts: A Systematic Literature Analysis , 2011 .

[91]  Robert J. Smith,et al.  Personalized medicine in diabetes. , 2011, Clinical chemistry.

[92]  Egon L. Willighagen,et al.  OSCAR4: a flexible architecture for chemical text-mining , 2011, J. Cheminformatics.

[93]  Christian Herder,et al.  Genetics of type 2 diabetes: pathophysiologic and clinical relevance , 2011, European journal of clinical investigation.

[94]  Nophar Geifman,et al.  Towards an Age-Phenome Knowledge-base , 2011, BMC Bioinformatics.

[95]  Mark D. Wilkinson,et al.  The Semantic Automated Discovery and Integration (SADI) Web service Design-Pattern, API and Reference Implementation , 2011 .

[96]  Paul N. Schofield,et al.  PhenomeNET: a whole-phenome approach to disease gene discovery , 2011, Nucleic acids research.

[97]  Alfonso Valencia,et al.  How to link ontologies and protein–protein interactions to literature: text-mining approaches and the BioCreative experience , 2012, Database J. Biol. Databases Curation.

[98]  María Martín,et al.  The Gene Ontology: enhancements for 2011 , 2011, Nucleic Acids Res..

[99]  R. Pietrobon,et al.  Turning Text into Research Networks: Information Retrieval and Computational Ontologies in the Creation of Scientific Databases , 2012, PloS one.

[100]  Damian Smedley,et al.  MouseFinder: Candidate disease genes from mouse phenotype data , 2012, Human mutation.

[101]  Russ B. Altman,et al.  Discovery and Explanation of Drug-Drug Interactions via Text Mining , 2011, Pacific Symposium on Biocomputing.

[102]  Dietrich Rebholz-Schuhmann,et al.  Automatic recognition of conceptualization zones in scientific articles and two life science applications , 2012, Bioinform..

[103]  D. Cooper,et al.  Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain , 2012, Human mutation.

[104]  K. Bretonnel Cohen,et al.  Text mining for the biocuration workflow , 2012, Database J. Biol. Databases Curation.

[105]  Goran Nenadic,et al.  Towards semi-automated curation: using text mining to recreate the HIV-1, human protein interaction database , 2012, Database J. Biol. Databases Curation.

[106]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[107]  Huajun Chen,et al.  Semantic Web meets Integrative Biology: a survey , 2013, Briefings Bioinform..