Multi-dimensional correlations for gene coexpression and application to the large-scale data of Arabidopsis

Background: Recent improvements in DNA microarray techniques have made a large variety of gene expression data available in public databases. This data can be used to evaluate the strength of gene coexpression by calculating the correlation of expression patterns among different genes between many experiments. However, gene expression levels differ significantly across various tissues in higher organisms, as well as in different cellular location in eukaryotes in different cell state. Thus the usual correlation measure can only evaluate the difference of tissues or cellular localizations, and cannot adequately elucidate the functional relationship from the coexpression of genes. Method: We propose a new measure of coexpression by expanding the generally used correlation into a multidimensional one. We used principal component analyses to identify the major factors of gene expression correlation, and then re-calculate the correlation by subtracting the major components in order to remove biases cased by a few experiments. The repeated subtractions of the major components yielded a set of correlation values for each pair of genes. We observed the correlation changes when the first ten principal components were subtracted step-by-step in large-scale Arabidopsis expression data. Results: We found two extreme patterns of correlation changes, corresponding to stable and fragile coexpression. Our new indexes provided a good means to determine the functional relationships of the genes, by examining a few examples, and higher performance of Gene Ontology term prediction by using the support vector machine and the multidimensional correlation. Availability: The results are available from the expression detail pages in ATTED-II (http://atted.jp). Contact: kinosita@hgc.jp Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Helen E. Parkinson,et al.  ArrayExpress—a public database of microarray experiments and gene expression profiles , 2006, Nucleic Acids Res..

[2]  F. Sato,et al.  Distinct Functions for the Two PsbP-Like Proteins PPL1 and PPL2 in the Chloroplast Thylakoid Lumen of Arabidopsis1[W][OA] , 2007, Plant Physiology.

[3]  Kengo Kinoshita,et al.  COXPRESdb: a database of coexpressed gene networks in mammals , 2007, Nucleic Acids Res..

[4]  Guang Li,et al.  AtPID: Arabidopsis thaliana protein interactome database—an integrative platform for plant systems biology , 2007, Nucleic Acids Res..

[5]  Masanori Arita,et al.  SVD-based Anatomy of Gene Expressions for Correlation Analysis in Arabidopsis thaliana , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[6]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[7]  John W. Pinney,et al.  Arabidopsis Co-expression Tool (ACT): web server tools for microarray-based gene expression analysis , 2006, Nucleic Acids Res..

[8]  Kengo Kinoshita,et al.  PrDOS: prediction of disordered protein regions from amino acid sequence , 2007, Nucleic Acids Res..

[9]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[10]  Björn Usadel,et al.  CSB.DB: a comprehensive systems-biology database , 2004, Bioinform..

[11]  Peter Westhoff,et al.  The HCF136 protein is essential for assembly of the photosystem II reaction center in Arabidopsis thaliana , 2002, FEBS letters.

[12]  Kiana Toufighi,et al.  The Botany Array Resource: E-northerns, Expression Angling, and Promoter Analyses , 2022 .

[13]  A. Fraser,et al.  A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans , 2008, Nature Genetics.

[14]  David Botstein,et al.  Disruption of yeast forkhead-associated cell cycle transcription by oxidative stress. , 2004, Molecular biology of the cell.

[15]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[16]  P. Zimmermann,et al.  Gene-expression analysis and network discovery using Genevestigator. , 2005, Trends in plant science.

[17]  Peer Bork,et al.  Similar gene expression profiles do not imply similar tissue functions. , 2006, Trends in genetics : TIG.

[18]  Kai Li,et al.  Exploring the functional landscape of gene expression: directed search of large microarray compendia , 2007, Bioinform..

[19]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[20]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[21]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[22]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[23]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[24]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Kengo Kinoshita,et al.  ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis , 2006, Nucleic Acids Res..

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[28]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[29]  X. Yi,et al.  The PsbQ Protein Is Required in Arabidopsis for Photosystem II Assembly/Stability and Photoautotrophy under Low Light Conditions* , 2006, Journal of Biological Chemistry.

[30]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[32]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[33]  J. Garin,et al.  New Subunits NDH-M, -N, and -O, Encoded by Nuclear Genes, Are Essential for Plastid Ndh Complex Functioning in Higher Plantsw⃞ , 2005, The Plant Cell Online.

[34]  Yoshiyuki Ogata,et al.  Approaches for extracting practical information from gene co-expression networks in plant biology. , 2007, Plant & cell physiology.

[35]  Kazuho Ikeo,et al.  CIBEX: center for information biology gene expression database. , 2003, Comptes rendus biologies.

[36]  Martin Schindler,et al.  PathoPlant®: a platform for microarray expression data to analyze co-regulated genes involved in plant defense responses , 2006, Nucleic Acids Res..

[37]  Ned S. Wingreen,et al.  Finding regulatory modules through large-scale gene-expression data analysis , 2003, Bioinform..

[38]  Nick James,et al.  NASCArrays: a repository for microarray data generated by NASC's transcriptomics service , 2004, Nucleic Acids Res..

[39]  Kengo Kinoshita,et al.  ATTED-II provides coexpressed gene networks for Arabidopsis , 2008, Nucleic Acids Res..

[40]  F. Myouga,et al.  CRR23/NdhL is a subunit of the chloroplast NAD(P)H dehydrogenase complex in Arabidopsis. , 2008, Plant & cell physiology.

[41]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[42]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.