The Functional Genomics Network in the evolution of biological text mining over the past decade.

Different programs of The European Science Foundation (ESF) have contributed significantly to connect researchers in Europe and beyond through several initiatives. This support was particularly relevant for the development of the areas related with extracting information from papers (text-mining) because it supported the field in its early phases long before it was recognized by the community. We review the historical development of text mining research and how it was introduced in bioinformatics. Specific applications in (functional) genomics are described like it's integration in genome annotation pipelines and the support to the analysis of high-throughput genomics experimental data, and we highlight the activities of evaluation of methods and benchmarking for which the ESF programme support was instrumental.

[1]  Kimberly Van Auken,et al.  Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation , 2009, BMC Bioinformatics.

[2]  Michael Lappe,et al.  From gene networks to gene function. , 2003, Genome research.

[3]  G. Tuskan,et al.  Identification of candidate genes in Arabidopsis and Populus cell wall biosynthesis using text-mining, co-expression network analysis and comparative genomics. , 2011, Plant science : an international journal of experimental plant biology.

[4]  Alexander A. Morgan,et al.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup , 2003, ISMB.

[5]  Miguel A. Andrade-Navarro,et al.  Automatic Annotation for Biological Sequences by Etraction of Keywords from MEDLINE Abstracts: Development of a Prototype System , 1997, ISMB.

[6]  Daniele Santoni,et al.  Combining Network Modeling and Gene Expression Microarray Analysis to Explore the Dynamics of Th1 and Th2 Cell Regulation , 2010, PLoS Comput. Biol..

[7]  Goran Nenadic,et al.  Towards semi-automated curation: using text mining to recreate the HIV-1, human protein interaction database , 2012, Database J. Biol. Databases Curation.

[8]  Lawrence Hunter,et al.  Biomedical Discovery Acceleration, with Applications to Craniofacial Development , 2009, PLoS Comput. Biol..

[9]  Lawrence Hunter,et al.  Improving protein function prediction methods with integrated literature data , 2008, BMC Bioinformatics.

[10]  Zhiyong Lu,et al.  Overview of the BioCreative III Workshop , 2011, BMC Bioinformatics.

[11]  Jong C. Park,et al.  Bioie: Retargetable Information Extraction and Ontological Annotation of Biological Interactions from the Literature , 2004, J. Bioinform. Comput. Biol..

[12]  Proux,et al.  Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction. , 1998, Genome informatics. Workshop on Genome Informatics.

[13]  Goran Nenadic,et al.  IeXML: towards an annotation framework for biomedical semantic types enabling interoperability of text processing modules , 2006 .

[14]  Timothy J. Lavelle,et al.  Immunological network signatures of cancer progression and survival , 2011, BMC Medical Genomics.

[15]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[16]  Núria Bel,et al.  Iula2Standoff: a tool for creating standoff documents for the IULACT , 2012, LREC.

[17]  Graham Wilcock Annotation Interchange with XSLT , 2009 .

[18]  Isabel Segura-Bedmar,et al.  The 1st DDIExtraction-2011 challenge task: Extraction of Drug-Drug Interactions from biomedical texts , 2011 .

[19]  D. Barrell,et al.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. , 2003, Genome research.

[20]  Borut Peterlin,et al.  Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation , 2009, BioLINK@ISMB/ECCB.

[21]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[22]  Miguel A. Andrade-Navarro,et al.  Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families , 1998, Bioinform..

[23]  Chris Sander,et al.  Introducing meta-services for biomedical information extraction , 2008, Genome Biology.

[24]  Judith A. Blake,et al.  Integrating text mining into the MGI biocuration workflow , 2009, Database J. Biol. Databases Curation.

[25]  Yongqun He,et al.  BBP: Brucella genome annotation with literature mining and curation , 2006, BMC Bioinformatics.

[26]  Sophia Ananiadou,et al.  Automatic extraction of microorganisms and their habitats from free text using text mining workflows , 2011, J. Integr. Bioinform..

[27]  Alfonso Valencia,et al.  Information extraction in molecular biology , 2002, Briefings Bioinform..

[28]  Nigel Collier,et al.  Automatic Term Identification and Classification in Biology Texts. , 1999 .

[29]  Jacob de Vlieg,et al.  Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases , 2010, PLoS Comput. Biol..

[30]  K. Bretonnel Cohen,et al.  Text mining for the biocuration workflow , 2012, Database J. Biol. Databases Curation.

[31]  P. Bork,et al.  G2D: a tool for mining genes associated with disease , 2005, BMC Genetics.

[32]  Sanmay Das,et al.  Identifying Relevant Data for a Biological Database: Handcrafted Rules versus Machine Learning , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Yasunori Yamamoto,et al.  Automatic Construction of Knowledge Base from Biological Papers , 1997, ISMB.

[34]  Christopher J. Rawlings,et al.  Enhancing Data Integration with Text Analysis to Find Proteins Implicated in Plant Stress Response , 2010, J. Integr. Bioinform..

[35]  A. Valencia,et al.  Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge , 2008, Genome Biology.

[36]  T. Park,et al.  Pathway-Based Evaluation in Early Onset Colorectal Cancer Suggests Focal Adhesion and Immunosuppression along with Epithelial-Mesenchymal Transition , 2012, PloS one.

[37]  Ming Yin,et al.  Identification of Hub Genes Related to the Recovery Phase of Irradiation Injury by Microarray and Integrated Gene Network Analysis , 2011, PloS one.

[38]  Alfonso Valencia,et al.  Critical Assessment of Information Extraction Systems in Biology , 2003, Comparative and functional genomics.

[39]  Miguel A. Andrade-Navarro,et al.  Génie: literature-based gene prioritization at multi genomic scale , 2011, Nucleic Acids Res..

[40]  Sophia Ananiadou,et al.  Mining metabolites: extracting the yeast metabolome from the literature , 2010, Metabolomics.

[41]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[42]  Alfonso Valencia,et al.  Text-mining approaches in molecular biology and biomedicine. , 2005, Drug discovery today.

[43]  Karin M. Verspoor,et al.  Text Mining Improves Prediction of Protein Functional Sites , 2012, PloS one.

[44]  K. Bretonnel Cohen,et al.  U-Compare: share and compare text mining tools with UIMA , 2009, Bioinform..

[45]  Y Yang,et al.  An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts , 1996, Comput. Biol. Medicine.

[46]  Lawrence Hunter,et al.  Leveraging existing biological knowledge in the identification of candidate genes for facial dysmorphology , 2009, BMC Bioinformatics.

[47]  Steven J. M. Jones,et al.  Text-mining assisted regulatory annotation , 2008, Genome Biology.

[48]  Alfonso Valencia,et al.  Extraction of human kinase mutations from literature, databases and genotyping studies , 2009, BMC Bioinformatics.

[49]  Laura Inés Furlong,et al.  Assessment of NER solutions against the first and second CALBC Silver Standard Corpus , 2011, Semantic Mining in Biomedicine.

[50]  Nona Naderi,et al.  Automated extraction and semantic analysis of mutation impacts from the biomedical literature , 2012, BMC Genomics.

[51]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[52]  W. Alkema,et al.  Prednisolone-induced differential gene expression in mouse liver carrying wild type or a dimerization-defective glucocorticoid receptor , 2010, BMC Genomics.

[53]  Alfonso Valencia,et al.  A sentence sliding window approach to extract protein annotations from biomedical articles , 2005, BMC Bioinformatics.

[54]  Hagit Shatkay,et al.  Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis , 2000, ISMB.

[55]  Martin Krallinger,et al.  Analysis of biological processes and diseases using text mining approaches. , 2010, Methods in molecular biology.

[56]  Lynette Hirschman,et al.  The FEBS Letters/BioCreative II.5 experiment: making biological information accessible , 2010, Nature Biotechnology.

[57]  D. Chaussabel,et al.  Mining microarray expression data by literature profiling , 2002, Genome Biology.

[58]  Luis Mateus Rocha,et al.  Use of Text Mining for Protein Structure Prediction and Functional Annotation in Lack of Sequence Homology , 2006 .

[59]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[60]  Ellen M. Voorhees,et al.  TREC genomics special issue overview , 2009, Information Retrieval.

[61]  Cathy H. Wu,et al.  eFIP: a tool for mining functional impact of phosphorylation from literature. , 2011, Methods in molecular biology.

[62]  Yves Van de Peer,et al.  Integrating Large-Scale Text Mining and Co-Expression Networks : Targeting NADP ( H ) Metabolism in E . coli with Event Extraction , 2012 .

[63]  Hagen Blankenburg,et al.  Integrating biological data – the Distributed Annotation System , 2008, BMC Bioinformatics.

[64]  N R Smalheiser,et al.  Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. , 1998, Computer methods and programs in biomedicine.

[65]  Yoshinobu Kano,et al.  Extracting Protein Interactions from Text with the Unified AkaneRE Event Extraction System , 2010, TCBB.

[66]  R. Prather,et al.  Altered gene expression profiles in the brain, kidney, and lung of deceased neonatal cloned pigs. , 2010, Cellular reprogramming.

[67]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[68]  Alan R. Powell,et al.  Integration of text- and data-mining using ontologies successfully selects disease gene candidates , 2005, Nucleic acids research.

[69]  T. Helleday,et al.  Regulators of cyclin-dependent kinases are crucial for maintaining genome integrity in S phase , 2010, The Journal of cell biology.

[70]  David T. Jones,et al.  Improving classification in protein structure databases using text mining , 2009, BMC Bioinformatics.

[71]  Zhiyong Lu,et al.  Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE , 2012, Database J. Biol. Databases Curation.

[72]  Michael Gribskov,et al.  Use of keyword hierarchies to interpret gene expression patterns , 2001, Bioinform..

[73]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[74]  D. Swanson Medical literature as a potential source of new knowledge. , 1990, Bulletin of the Medical Library Association.

[75]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[76]  Catia Pesquita,et al.  Chemical Entity Recognition and Resolution to ChEBI , 2012, ISRN bioinformatics.

[77]  Park,et al.  Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. , 1998, Genome informatics. Workshop on Genome Informatics.

[78]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[79]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[80]  A. Valencia,et al.  Mining functional information associated with expression arrays , 2001, Functional & Integrative Genomics.

[81]  K. Bretonnel Cohen,et al.  Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) , 2009, BMC Bioinformatics.

[82]  Steven J. M. Jones,et al.  Annotating the regulatory genome. , 2010, Methods in molecular biology.

[83]  William R. Hersh,et al.  TREC GENOMICS Track Overview , 2003, TREC.

[84]  Lawrence Hunter,et al.  Mining molecular binding terminology from biomedical text , 1999, AMIA.

[85]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[86]  A. Valencia,et al.  Linking genes to literature: text mining, information extraction, and retrieval applications for biology , 2008, Genome Biology.