Finding kinetic parameters using text mining.

The mathematical modeling and description of complex biological processes has become more and more important over the last years. Systems biology aims at the computational simulation of complex systems, up to whole cell simulations. An essential part focuses on solving a large number of parameterized differential equations. However, measuring those parameters is an expensive task, and finding them in the literature is very laborious. We developed a text mining system that supports researchers in their search for experimentally obtained parameters for kinetic models. Our system classifies full text documents regarding the question whether or not they contain appropriate data using a support vector machine. We evaluated our approach on a manually tagged corpus of 800 documents and found that it outperforms keyword searches in abstracts by a factor of five in terms of precision.

[1]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[2]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[3]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[4]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[5]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[6]  Denys Proux,et al.  A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions , 2000, ISMB.

[7]  Christopher D. Manning,et al.  What's related? Generalizing approaches to related articles in medicine , 2000, AMIA.

[8]  Bart De Moor,et al.  Evaluation of the Vector Space Representation in Text-Based Gene Clustering , 2002, Pacific Symposium on Biocomputing.

[9]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[10]  Ulf Leser,et al.  Text mining for systems biology using statistical learning methods , 2003 .

[11]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[12]  Jeffrey T. Chang,et al.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. , 2002, Genome research.

[13]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[14]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[15]  The FlyBase database of the Drosophila genome projects and community literature. , 2003, Nucleic acids research.

[16]  Michael J. E. Sternberg,et al.  Predicting the Sub-Cellular Location of Proteins from Text Using Support Vector Machines , 2001, Pacific Symposium on Biocomputing.

[17]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[18]  Alexander A. Morgan,et al.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup , 2003, ISMB.

[19]  Toshihisa Takagi,et al.  Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource. , 2003, Genome research.

[20]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[21]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[22]  Joel D. Martin,et al.  Literature mining in molecular biology , 2002 .

[23]  Alfonso Valencia,et al.  Information extraction in molecular biology , 2002, Briefings Bioinform..

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[25]  Peter F. Stadler,et al.  litsift: Automated Text Categorization in Bibliographic Search , 2003 .

[26]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[27]  Florian Markowetz,et al.  Support Vector Machines in Bioinformatics , 2002 .

[28]  Gregory R. Grant,et al.  Statistical Methods in Bioinformatics , 2001 .

[29]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[30]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[31]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[32]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[33]  A. Grigoriev On the number of protein-protein interactions in the yeast proteome. , 2003, Nucleic acids research.

[34]  Antje Chang,et al.  BRENDA, enzyme data and metabolic information , 2002, Nucleic Acids Res..

[35]  Eric R. Ziegel,et al.  Statistical Methods in Bioinformatics , 2002, Technometrics.

[36]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[37]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[38]  David D. Lewis,et al.  The TREC-5 Filtering Track , 1996, TREC.

[39]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[40]  R. Altman,et al.  Using text analysis to identify functionally coherent gene groups. , 2002, Genome research.