Natural Language Processing and Unsupervised Learning: It’s Significance on Biomedical Literature

There is massive information hidden in the biomedical literature in the form of scientific publications, book chapters, conference reports, etc. This information is growing exponentially with the speed exceeding Moore’s Law i.e. observations double in every two years. It is therefore not possible for researchers and practitioners to automatically extract and relate information from different written resources. Also the data present in the written recourses is unstructured i.e. free-text therefore it becomes very arduous and exorbitant to obtain annotated material for its literature. So in order to overcome these problems Natural Language Processing (NLP) and Unsupervised Learning approaches are used. Natural Language Processing approach is the part of text mining which is the discovery by computer of new, previously unknown information by automatically extracting and relating information from different written resources to reveal the otherwise ‘hidden’ meanings. The Unsupervised Learning approach is the part of machine learning where no annotated training is necessary and it is more about exploring the data to find insights. Both these approaches can be used to find knowledge from written textual data in the form different interactions like protein-protein, gene-gene, gene-protein, etc. These approaches could also be used to develop classifiers, databases, tools or softwares which in future would automatically extract the knowledgeable information from literature, answering questions arising in the biomedical research and would also help in the development of new hypothesis. So here we discuss 53 softwares, tools and databases developed using Natural Language Processing (NLP) and unsupervised learning approaches, which are involved in plain texts analyzing and processing, categorizes current work in biomedical information and entities extraction.

[1]  Zhiyong Lu,et al.  SimConcept: a hybrid approach for simplifying composite named entities in biomedicine , 2014, BCB.

[2]  Michael Schroeder,et al.  Inter-species normalization of gene mentions with GNAT , 2008, ECCB.

[3]  K. Becker,et al.  Disease and phenotype gene set analysis of disease-based gene expression in mouse and human. , 2010, Physiological genomics.

[4]  Timur Shtatland,et al.  PepBank - a database of peptides based on sequence text mining and public peptide data sources , 2007, BMC Bioinformatics.

[5]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[6]  Jau-Min Wong,et al.  PICO element detection in medical text without metadata: Are first sentences enough? , 2013, J. Biomed. Informatics.

[7]  Fabio Rinaldi,et al.  Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach , 2007, Artif. Intell. Medicine.

[8]  Dezon Finch,et al.  TagLine: Information Extraction for Semi-Structured Text in Medical Progress Notes , 2014, AMIA.

[9]  Alfonso Valencia,et al.  Text-mining approaches in molecular biology and biomedicine. , 2005, Drug discovery today.

[10]  Byoung-Tak Zhang,et al.  PIE: an online prediction system for protein–protein interactions from text , 2008, Nucleic Acids Res..

[11]  Cui Tao,et al.  Semantator: Semantic annotator for converting biomedical text to linked data , 2013, J. Biomed. Informatics.

[12]  Hong Cui,et al.  MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions , 2016, Journal of Biomedical Semantics.

[13]  Raja Mazumder,et al.  DiMeX: A Text Mining System for Mutation-Disease Association Extraction , 2016, PloS one.

[14]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from full texts , 2004, Bioinform..

[15]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[16]  Jacob de Vlieg,et al.  CoPub update: CoPub 5.0 a text mining system to answer biological questions , 2011, Nucleic Acids Res..

[17]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[18]  W. Alkema,et al.  Application of text mining in the biomedical domain. , 2015, Methods.

[19]  Janos X. Binder,et al.  DISEASES: Text mining and data integration of disease–gene associations , 2014, bioRxiv.

[20]  Tiffani J. Bright,et al.  PubMatrix: a tool for multiplex literature mining , 2003, BMC Bioinformatics.

[21]  Sarah Cohen Boulakia,et al.  Gene List significance at-a-glance with GeneValorization , 2011, Bioinform..

[22]  David J. States,et al.  MiSearch adaptive pubMed search tool , 2009, Bioinform..

[23]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[24]  Todd H. Stokes,et al.  Simplevisgrid: Grid services for visualization of diverse biomedical knowledge and molecular systems data , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[25]  Martin H. Schaefer,et al.  MedlineRanker: flexible ranking of biomedical literature , 2009, Nucleic Acids Res..

[26]  Deyu Zhou,et al.  Methodological Review: Extracting interactions between proteins from the literature , 2008 .

[27]  Dietrich Rebholz-Schuhmann,et al.  PCorral—interactive mining of protein interactions from MEDLINE , 2013, Database J. Biol. Databases Curation.

[28]  Ulf Leser,et al.  ALIBABA: PubMed as a graph , 2006, Bioinform..

[29]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[30]  Jaewoo Kang,et al.  BRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations , 2016, Database J. Biol. Databases Curation.

[31]  Naveen Kumar,et al.  MetaBioME: a database to explore commercially useful enzymes in metagenomic datasets , 2009, Nucleic Acids Res..

[32]  Reinhard Schneider,et al.  Martini: using literature keywords to compare gene sets , 2009, Nucleic acids research.

[33]  James Lewis,et al.  Data and text mining Text similarity : an alternative way to search MEDLINE , 2006 .

[34]  Fan Meng,et al.  An active visual search interface for Medline. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[35]  Shawn M. Douglas,et al.  PubNet: a flexible system for visualizing literature derived networks , 2005, Genome Biology.

[36]  Cheng-Ming Chuong,et al.  Pubfocus: Semantic Medline/pubmed Citations Analytics through Integration of Controlled Biomedical Dictionaries and Ranking Algorithm Pubfocus:semanticmedline/pubmedcitations Analyticsthroughintegrationofcontrolledbiomedical Dictionariesandrankingalgorithm , 2022 .

[37]  Karsten Hokamp,et al.  PubCrawler: keeping up comfortably with PubMed and GenBank , 2004, Nucleic Acids Res..

[38]  Maurice H. T. Ling,et al.  BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature , 2009, BMC Bioinformatics.

[39]  Natalya F. Noy,et al.  BioPortal: Ontologies and Integrated Data Resources at the Click of a Mouse , 2009 .

[40]  Andrey Rzhetsky,et al.  DiseaseConnect: a comprehensive web server for mechanism-based disease–disease connections , 2014, Nucleic Acids Res..

[41]  Sophia Ananiadou,et al.  Discovering and visualizing indirect associations between biomedical concepts , 2011, Bioinform..

[42]  Kalpana Raja,et al.  PPInterFinder—a mining tool for extracting causal relations on human proteins from literature , 2013, Database J. Biol. Databases Curation.

[43]  Girish Chavan,et al.  NOBLE – Flexible concept recognition for large-scale biomedical natural language processing , 2016, BMC Bioinformatics.

[44]  Authorship trends in the surgical literature , 2010, The British journal of surgery.

[45]  Cathy H. Wu,et al.  miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases , 2016, Journal of Biomedical Semantics.

[46]  M. Vidal,et al.  Literature-curated protein interaction datasets , 2009, Nature Methods.

[47]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[48]  C Baral,et al.  CBioC: beyond a prototype for collaborative annotation of molecular interactions from the literature. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[49]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..