A system for knowledge management in bioinformatics

The emerging biochip technology has made it possible to simultaneously study expression (activity level) of thousands of genes or proteins in a single experiment in the laboratory. However, in order to extract relevant biological knowledge from the biochip experimental data, it is critical not only to analyze the experimental data, but also to cross-reference and correlate these large volumes of data with information available in external biological databases accessible online. We address this problem in a comprehensive system for knowledge management in bioinformatics called e2e. To the biologist or biological applications, e2e exposes a common semantic view of inter-relationship among biological concepts in the form of an XML representation called eXpressML, while internally, it can use any data integration solution to retrieve data and return results corresponding to the semantic view. We have implemented an e2e prototype that enables a biologist to analyze her gene expression data in GEML or from a public site like Stanford, and discover knowledge through operations like querying on relevant annotated data represented in eXpressML using pathways data from KEGG, publication data from Medline and protein data from SWISS-PROT.

[1]  I. Jonassen,et al.  Predicting gene regulatory elements in silico on a genomic scale. , 1998, Genome research.

[2]  Carole A. Goble,et al.  Transparent access to multiple bioinformatics information sources , 2001, IBM Syst. J..

[3]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[4]  Daniela Florescu,et al.  Quilt: an xml query language , 2000 .

[5]  K. Murali,et al.  MedMeSH Summarizer: Text Mining for Gene Clusters , 2002, SDM.

[6]  Alon Y. Levy Combining artificial intelligence and databases for data integration , 1999 .

[7]  Biplav Srivastava Using Planning for Query Decomposition in Bioinformatics , .

[8]  Thure Etzold,et al.  SRS - an indexing and retrieval tool for flat file data libraries , 1993, Comput. Appl. Biosci..

[9]  Sudeshna Adak,et al.  Genome-Wide Pathway Analysis and Visualization Using Gene Expression Data , 2001, Pacific Symposium on Biocomputing.

[10]  Limsoon Wong,et al.  A Data Transformation System for Biological Data Sources , 1995, VLDB.

[11]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[13]  Andreas D. Baxevanis,et al.  The Molecular Biology Database Collection: an updated compilation of biological database resources , 2001, Nucleic Acids Res..

[14]  Biplav Srivastava,et al.  A Common Data Representation for Organizing and Managing Annotations of Biochip Expression Data , 2002 .