Microarray data warehouse allowing for inclusion of experiment annotations in statistical analysis

MOTIVATION Microarray technology provides access to expression levels of thousands of genes at once, producing large amounts of data. These datasets are valuable only if they are annotated by sufficiently detailed experiment descriptions. However, in many databases a substantial number of these annotations is in free-text format and not readily accessible to computer-aided analysis. RESULTS The Multi-Conditional Hybridization Intensity Processing System (M-CHIPS), a data warehousing concept, focuses on providing both structure and algorithms suitable for statistical analysis of a microarray database's entire contents including the experiment annotations. It addresses the rapid growth of the amount of hybridization data, more detailed experimental descriptions, and new kinds of experiments in the future. We have developed a storage concept, a particular instance of which is an organism-specific database. Although these databases may contain different ontologies of experiment annotations, they share the same structure and therefore can be accessed by the very same statistical algorithms. Experiment ontologies have not yet reached their final shape, and standards are reduced to minimal conventions that do not yet warrant extensive description. An ontology-independent structure enables updates of annotation hierarchies during normal database operation without altering the structure. AVAILABILITY AND SUPPLEMENTARY INFORMATION http://www.dkfz.de/tbi/services/mchips

[1]  J. Hoheisel,et al.  Correspondence analysis applied to microarray data , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  G. Church,et al.  Systematic management and analysis of yeast gene expression data. , 2000, Genome research.

[3]  Graham Cameron,et al.  One-stop shop for microarray data , 2000, Nature.

[4]  P. Brown,et al.  A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. , 1996, Genome research.

[5]  Jonathan Crabtree,et al.  A relational schema for both array-based and SAGE gene expression experiments , 2001, Bioinform..

[6]  M. Bittner,et al.  Data management and analysis for gene expression arrays , 1998, Nature Genetics.

[7]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M. Schena Genome analysis with gene expression microarrays. , 1996, BioEssays : news and reviews in molecular, cellular and developmental biology.

[9]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[10]  Martin Vingron,et al.  Processing and quality control of DNA array hybridization data , 2000, Bioinform..

[11]  G. Lennon,et al.  Hybridization analyses of arrayed cDNA libraries. , 1991, Trends in genetics : TIG.

[12]  Chuck Ballard,et al.  Data Modeling Techniques for Data Warehousing , 1999 .

[13]  M. Eisen,et al.  Gene expression informatics —it's all in your mine , 1999, Nature Genetics.

[14]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[15]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[16]  Vladimir Brusic,et al.  Data Warehousing in Molecular Biology , 2000, Briefings Bioinform..

[17]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[18]  J Khan,et al.  DNA microarray technology: the anticipated impact on the study of human disease. , 1999, Biochimica et biophysica acta.