We study a parametrized definition of gene clusters that permits control over the trade-off between increasing gene content versus conserving gene order within a cluster. This is based on the notion of generalized adjacency, which is the property shared by any two genes no farther apart, in the linear order of a chromosome, than a fixed threshold parameter i¾?. Then a cluster in two or more genomes is just a maximal set of markers, where in each genome these markers form a connected chain of generalized adjacencies. Since even pairs of randomly constructed genomes may have many generalized adjacency clusters in common, we study the statistical properties of generalized adjacency clusters under the null hypothesis that the markers are ordered completely randomly on the genomes. We derive expresions for the exact values of the expected number of clusters of a given size, for large and small values of the parameter. We discover through simulations that the trend from small to large clusters as a function of the parameter theta exhibits a "cut-off" phenomenon at or near $\sqrt{\theta}$ as genome size increases.
[1]
David Sankoff,et al.
Poisson adjacency distributions in genome comparison: multichromosomal, circular, signed and unsigned cases
,
2008,
ECCB.
[2]
David Sankoff,et al.
The Statistical Analysis of Spatially Clustered Genes under the Maximum Gap Criterion
,
2005,
J. Comput. Biol..
[3]
J. Wolfowitz.
Note on Runs of Consecutive Elements
,
1944
.
[4]
Rita Casadio,et al.
Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings
,
2005,
WABI.
[5]
David Sankoff,et al.
Generalized Gene Adjacencies, Graph Bandwidth, and Clusters in Yeast Evolution
,
2008,
IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[6]
David Sankoff,et al.
Tests for gene clustering
,
2002,
RECOMB '02.
[7]
Mathieu Raffinot,et al.
The Algorithmic of Gene Teams
,
2002,
WABI.
[8]
Oliver Eulenstein,et al.
Bioinformatics Research and Applications
,
2008
.