Text mining in genomics and systems biology
暂无分享,去创建一个
There is an increasing need of complementing the information available for the analysis of biological systems in Systems Biology and Genomics projects. A need that makes interesting the integration of information directly extracted from textual sources using Information Extraction and Text Mining approaches. My group has been working in developing Text Mining approaches and in their integration in large-scale projects together with other experimental and bioinformatics methods. In this occasion I will present the developments related with the characterization of the human mitotic spindle apparatus, developed in the context of the ENFIN NoE. For these, and other, applications it is crucial to have an accurate estimation of the capacity of the current Text Mining systems. The BioCreative II challenge organized by CNIO, MITRE and NCBI in collaboration with the MINT and INTACT databases (http://biocreative.sourceforge.net, Genome Biology, August 2008 Special Issue) provides such an overview. BioCreative II was in two task: 1) gene name identification and normalization, where many systems were able to achieve a consistent 80% balance precision / recall. And 2) protein interaction detection that was divided in four sub-tasks: a) ranking of publications by their relevance on experimental determination of protein interactions, b) detection of protein interaction partners in text, c) detection of key sentences describing protein interactions, and d) detection of the experimental technique used to determine the interactions. The results were quite good in the categories of publication raking, detection of experimental methods, and highlighting of relevant sentences, while they pointed to persistent problems in the correct normalization of gene/protein names. Furthermore BioCreative has channel the collaboration of several teams for the creation of the first Text Mining meta-server (The BioCreative Meta-server, Leitner et al., Genome Biology 2008 BioCreative special issue). We are working now in the preparation of BioCreative III, with particular focus in fostering the creation of Text Mining systems that can be integrated in Genome analysis pipelines, and contribute effectively to the understanding of complex Biological Systems.