Genetic network inference: the effects of preprocessing.

Clustering of gene expression data and gene network inference from such data has been a major research topic in recent years. In clustering, pairwise measurements are performed when calculating the distance matrix upon which the clustering is based. Pairwise measurements can also be used for gene network inference, by deriving potential interactions above a certain correlation or distance threshold. Our experiments show how interaction networks derived by this simple approach exhibit low-but significant-sensitivity and specificity. We also explore the effects that normalization and prefiltering have on the results of methods for identifying interactions from expression data. Before derivation of interactions or clustering, preprocessing is often performed by applying normalization to rescale the expression profiles and prefiltering where genes that do not appear to contribute to regulation are removed. In this paper, different ways of normalizing in combination with different distance measurements are tested on both unfiltered and prefiltered data, different prefiltering criteria are considered.

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  J. Tyson,et al.  The dynamics of cell cycle regulation. , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[3]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[4]  Michael L. Bittner,et al.  Issues associated with microarray data analysis and integration , 1999 .

[5]  Michael E. Cusick,et al.  The Yeast Proteome Database (YPD) and Caenorhabditis elegans Proteome Database (WormPD): comprehensive resources for the organization and comparison of model organism protein information , 2000, Nucleic Acids Res..

[6]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[7]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[8]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[9]  J. Collado-Vides,et al.  A web site for the computational analysis of yeast regulatory sequences , 2000, Yeast.

[10]  Katherine C. Chen,et al.  Kinetic analysis of a molecular model of the budding yeast cell cycle. , 2000, Molecular biology of the cell.

[11]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[12]  M. Q. Zhang,et al.  Cluster, function and promoter: analysis of yeast expression array. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[13]  Björn Olsson,et al.  Could correlation-based methods be used to derive genetic association networks? , 2002, Inf. Sci..

[14]  G. Church,et al.  Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae , 2001, Nature Genetics.

[15]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[16]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[17]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[18]  S. Drăghici Statistical intelligence: effective analysis of high-density microarray data. , 2002, Drug discovery today.

[19]  J. Olson,et al.  A regression-based method to identify differentially expressed genes in microarray time course studies and its application in an inducible Huntington's disease transgenic model. , 2002, Human molecular genetics.

[20]  M. Reinders,et al.  Genetic network modeling. , 2002, Pharmacogenomics.

[21]  Michael Bittner,et al.  Data analysis and integration: of steps and arrows , 1999, Nature Genetics.

[22]  Kei-Hoi Cheung,et al.  TRIPLES: a database of gene function in Saccharomyces cerevisiae , 2000, Nucleic Acids Res..

[23]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[24]  R. Somogyi,et al.  The gene expression matrix: towards the extraction of genetic network architectures , 1997 .

[25]  Jan O. Korbel,et al.  Combining frequency and positional information to predict transcription factor binding sites , 2001, Bioinform..