Visualization of large-scale correlations in gene expressions

Large-scale expression data are today measured for several thousands of genes simultaneously. Furthermore, most genes are being categorized according to their properties. This development has been followed by an exploration of theoretical tools to integrate these diverse data types. A key problem is the large noise-level in the data. Here, we investigate ways to extract the remaining signals within these noisy data sets. We find large-scale correlations within data from Saccharomyces cerevisiae with respect to properties of the encoded proteins. These correlations are visualized in a way that is robust to the underlying noise in the measurement of the individual gene expressions. In particular, for S. cerevisiae we observe that the proteins corresponding to the 400 highest expressed genes typically are localized to the cytoplasm. These most expressed genes are not essential for cell survival.

[1]  J. Vohradský,et al.  Genome resource utilization during prokaryotic development , 2001, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[2]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[3]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[5]  Lars Juhl Jensen,et al.  Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation , 2000, Bioinform..

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[8]  Michael E. Cusick,et al.  The Yeast Proteome Database (YPD) and Caenorhabditis elegans Proteome Database (WormPD): comprehensive resources for the organization and comparison of model organism protein information , 2000, Nucleic Acids Res..

[9]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[10]  Marek S. Skrzypek,et al.  YPDTM, PombePDTM and WormPDTM: model organism volumes of the BioKnowledgeTM Library, an integrated resource for protein information , 2001, Nucleic Acids Res..

[11]  M. Gustafsson,et al.  Large-scale reverse engineering by the Lasso , 2004, q-bio/0403012.

[12]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[13]  J. E. Kranz,et al.  YPD, PombePD and WormPD: model organism volumes of the BioKnowledge library, an integrated resource for protein information. , 2001, Nucleic acids research.