Online Analytical Processing (OLAP): A Fast and Effective Data Mining Tool for Gene Expression Databases

Gene expression databases contain a wealth of information, but current data mining tools are limited in their speed and effectiveness in extracting meaningful biological knowledge from them. Online analytical processing (OLAP) can be used as a supplement to cluster analysis for fast and effective data mining of gene expression databases. We used Analysis Services 2000, a product that ships with SQLServer2000, to construct an OLAP cube that was used to mine a time series experiment designed to identify genes associated with resistance of soybean to the soybean cyst nematode, a devastating pest of soybean. The data for these experiments is stored in the soybean genomics and microarray database (SGMD). A number of candidate resistance genes and pathways were found. Compared to traditional cluster analysis of gene expression data, OLAP was more effective and faster in finding biologically meaningful information. OLAP is available from a number of vendors and can work with any relational database management system through OLE DB.

[1]  Ottoline Leyser,et al.  Ubiquitination and auxin signaling: a degrading story. , 2002, The Plant Cell.

[2]  Saso Dzeroski,et al.  Using data mining and OLAP to discover patterns in a database of patients with Y-chromosome deletions , 2000, AMIA.

[3]  P. Eastmond,et al.  Is trehalose-6-phosphate a regulator of sugar metabolism in plants? , 2003, Journal of experimental botany.

[4]  C A Ryan,et al.  Self defense by plants. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[5]  L. C. Loon 1 Occurrence and Properties of Plant Pathogenesis-Related Proteins , 1999 .

[6]  J E Mullet,et al.  Jasmonic acid distribution and action in plants: regulation during development and response to biotic and abiotic stress. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[7]  T. Eulgem,et al.  The transcriptome of Arabidopsis thaliana during systemic acquired resistance , 2000, Nature Genetics.

[8]  C. Campargue,et al.  The Response of Plant Cell Wall Hydroxyproline-Rich Glycoproteins to Microbial Pathogens and Their Elicitors , 1999 .

[9]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[10]  Mitja Rogac,et al.  Using data warehousing and OLAP in public health care , 2000, AMIA.

[11]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[12]  B. Matthews,et al.  Microarray analysis of gene expression in soybean roots susceptible to the soybean cyst nematode two days post invasion. , 2004, Journal of nematology.

[13]  Isabel M. Ramos,et al.  Applying Data Mining to Software Development Projects: A Case Study , 2004, ICEIS.

[14]  P. Low,et al.  The oxidative burst in plant defense: Function and signal transduction , 1996 .

[15]  I. Ahn,et al.  Analysis of genes expressed during rice-Magnaporthe grisea interactions. , 2001, Molecular plant-microbe interactions : MPMI.

[16]  S. Somerville,et al.  Coordinated plant defense responses in Arabidopsis revealed by microarray analysis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Nadim W. Alkharouf,et al.  SGMD: the Soybean Genomics and Microarray Database , 2004, Nucleic Acids Res..

[18]  D. Nettleton,et al.  Arabidopsis gene expression changes during cyst nematode parasitism revealed by statistical analyses of microarray expression profiles. , 2003, The Plant journal : for cell and molecular biology.