Clustering Method to Identify Gene Sets with Similar Expression Profiles in Adjacent Chromosomal Regions

The analysis of transcriptional data accounting for the chromosomal locations of genes can be applied to detecting gene sets sharing similar expression profiles in an adjacent chromosomal region. In this paper, we propose a new distance measure to integrate expression profiles with chromosomal locations. The performance of the proposed distance measure is evaluated via the bootstrap resampling procedure. We applied the proposed method to the microarray data in Drosophila genome and identified the set of genes of Toll and Imd pathway in adjacent chromosomal regions. Not only the proposed method gives stronger biological meaning to the clustering result, but also it provides biologically meaningful gene sets.

[1]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[2]  C. Pál,et al.  The evolutionary dynamics of eukaryotic gene order , 2004, Nature Reviews Genetics.

[3]  Arkady B Khodursky,et al.  Spatial patterns of transcriptional activity in the chromosome of Escherichia coli , 2004, Genome Biology.

[4]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[5]  James L. Winkler,et al.  Accessing Genetic Information with High-Density DNA Arrays , 1996, Science.

[6]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  G. Rubin,et al.  The Toll and Imd pathways are the major regulators of the immune response in Drosophila , 2002, The EMBO journal.

[8]  Joshua M. Stuart,et al.  Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans , 2002, Nature.

[9]  E. Sonnhammer,et al.  Genomic gene clustering analysis of pathways in eukaryotes. , 2003, Genome research.

[10]  Gerald M Rubin,et al.  Evidence for large domains of similarly expressed genes in the Drosophila genome , 2002, Journal of biology.

[11]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[12]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Hidetoshi Shimodaira,et al.  Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling , 2004, math/0508602.

[14]  Yuri Y. Shevelyov,et al.  Large clusters of co-expressed genes in the Drosophila genome , 2002, Nature.

[15]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  G. Church,et al.  A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression , 2000, Nature Genetics.

[18]  Paul T. Spellman,et al.  Genome-wide analysis of the Drosophila immune response by using oligonucleotide microarrays , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  F. Baas,et al.  The Human Transcriptome Map: Clustering of Highly Expressed Genes in Chromosomal Domains , 2001, Science.