Text mining systems biology: Turning the microscope back on the observer

Abstract In this review, we describe the relationship between systems biology and text mining. On the one hand, text mining functions as a practical tool for systems biology research, which integrates diverse sub-fields and takes an overall, systems-level view of biological phenomena. In this vein, various analyses have been done to recognize biological entities, construct networks of them (e.g. protein–protein interaction maps) and even find disease-associated genes directly from texts. On the other hand, text mining can also be applied to study systems biology itself, giving a ”distant-reading perspective” on the evolution of the field. For example, by examining changes in the frequencies of terms in systems biology publications, we can analyze trends in research focus and in the popularity of systems approaches in various subdomains (e.g., autism). Given these two uses of text mining for systems biology, we close with suggestions for adapting current publication formats for facilitating text mining and enabling its broader use.

[1]  Christian Blaschke,et al.  Status of text-mining techniques applied to biomedical text. , 2006, Drug discovery today.

[2]  Donald E. Walker,et al.  Natural Language Access to Medical Text , 1981 .

[3]  Mark Huisman,et al.  Imputation of missing network data: Some simple procedures , 2009, J. Soc. Struct..

[4]  Jian Su,et al.  Enhancing HMM-based biomedical named entity recognition by studying special phenomena , 2004, J. Biomed. Informatics.

[5]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[6]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  K. Bretonnel Cohen,et al.  Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[9]  W. Alkema,et al.  Application of text mining in the biomedical domain. , 2015, Methods.

[10]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[11]  Yifan Peng,et al.  LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC , 2018, Nucleic Acids Res..

[12]  Jerry R. Hobbs,et al.  Natural Language Access to Structured Text , 1982, COLING.

[13]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Bing Zhang,et al.  Fast network centrality analysis using GPUs , 2011, BMC Bioinformatics.

[15]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[16]  Wen-Lian Hsu,et al.  NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition , 2006, BMC Bioinformatics.

[17]  H. Kitano,et al.  Computational systems biology , 2002, Nature.

[18]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[19]  Darren J. Wilkinson,et al.  Bayesian methods in bioinformatics and computational systems biology , 2006, Briefings Bioinform..

[20]  Changqin Quan,et al.  Text mining and pattern clustering for relation extraction of breast cancer and related genes , 2017, 2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[21]  A. Valencia,et al.  Linking genes to literature: text mining, information extraction, and retrieval applications for biology , 2008, Genome Biology.

[22]  Lars Juhl Jensen,et al.  Text mining of 15 million full-text scientific articles , 2017, bioRxiv.

[23]  Georg Brabant,et al.  Constructing a molecular interaction network for thyroid cancer via large-scale text mining of gene and pathway events , 2015, BMC Systems Biology.

[24]  Shaowen Yao,et al.  An overview of topic modeling and its current applications in bioinformatics , 2016, SpringerPlus.

[25]  Kei-Hoi Cheung,et al.  Structured digital tables on the Semantic Web: toward a structured digital literature , 2010, Molecular systems biology.

[26]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[27]  Goran Nenadic,et al.  BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events , 2012, Bioinform..

[28]  A. Valencia,et al.  Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge , 2008, Genome Biology.

[29]  Georgios A. Pavlopoulos,et al.  Protein-protein interaction predictions using text mining methods. , 2015, Methods.

[30]  Jian Su,et al.  Recognizing Names in Biomedical Texts: a Machine Learning Approach , 2004 .

[31]  Jacob de Vlieg,et al.  Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases , 2010, PLoS Comput. Biol..