Gene Proximity Analysis across Whole Genomes via PQ Trees1

Permutations on strings representing gene clusters on genomes have been studied earlier by Uno and Yagiura (2000), Heber and Stoye (2001), Bergeron et al. (2002), Eres et al. (2003), and Schmidt and Stoye (2004) and the idea of a maximal permutation pattern was introduced by Eres et al. (2003). In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees (Booth and Leuker, 1976): this describes the inner structure and the relations between clusters succinctly, aids in filtering meaningful from apparently meaningless clusters, and also gives a natural and meaningful way of visualizing complex clusters. We identify a minimal consensus PQ tree and prove that it is equivalent to a maximal pi pattern (Eres et al., 2003) and each subgraph of the PQ tree corresponds to a nonmaximal permutation pattern. We present a general scheme to handle multiplicity in permutations and also give a linear time algorithm to construct the minimal consensus PQ tree. Further, we demonstrate the results on whole genome datasets. In our analysis of the whole genomes of human and rat, we found about 1.5 million common gene clusters but only about 500 minimal consensus PQ trees, with E. Coli K-12 and B. Subtilis genomes, we found only about 450 minimal consensus PQ trees out of about 15,000 gene clusters, and when comparing eight different Chloroplast genomes, we found only 77 minimal consensus PQ trees out of about 6,700 gene clusters. Further, we show specific instances of functionally related genes in two of the cases.

[1]  Kellogg S. Booth,et al.  Testing for the Consecutive Ones Property, Interval Graphs, and Graph Planarity Using PQ-Tree Algorithms , 1976, J. Comput. Syst. Sci..

[2]  M. Hagensee,et al.  DNA polymerase III requirement for repair of DNA damage caused by methyl methanesulfonate and hydrogen peroxide , 1987, Journal of bacteriology.

[3]  Takeaki Uno,et al.  Fast Algorithms to Enumerate All Common Intervals of Two Permutations , 1997, Algorithmica.

[4]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[5]  Bernard M. E. Moret,et al.  An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae , 2000 .

[6]  Jens Stoye,et al.  Finding All Common Intervals of k Permutations , 2001, CPM.

[7]  Mathieu Raffinot,et al.  The Algorithmic of Gene Teams , 2002, WABI.

[8]  L. Pachter,et al.  Strategies and tools for whole-genome alignments. , 2002, Genome research.

[9]  Gilles Didier,et al.  Common Intervals of Two Sequences , 2003, WABI.

[10]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[11]  Jens Stoye,et al.  On the Similarity of Sets of Permutations and Its Applications to Genome Comparison , 2003, COCOON.

[12]  L. Pachter,et al.  SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. , 2003, Genome research.

[13]  Gad M. Landau,et al.  A Combinatorial Approach to Automatic Discovery of Cluster-Patterns , 2003, WABI.

[14]  Jens Stoye,et al.  Quadratic Time Algorithms for Finding Common Intervals in Two and More Sequences , 2004, CPM.

[15]  Ross M. McConnell A certifying algorithm for the consecutive-ones property , 2004, SODA '04.

[16]  Xin He,et al.  Identifying conserved gene clusters in the presence of orthologous groups , 2004, RECOMB '04.

[17]  Jens Stoye,et al.  Reversal Distance without Hurdles and Fortresses , 2004, CPM.

[18]  Angshumoy Roy,et al.  Tektin3 encodes an evolutionarily conserved putative testicular microtubules‐related protein expressed preferentially in male germ cells , 2004, Molecular reproduction and development.

[19]  John F. Mulley,et al.  Comparative genomics: Small genome, big insights , 2004, Nature.

[20]  Annie Chateau,et al.  Reconstructing Ancestral Gene Orders Using Conserved Intervals , 2004, WABI.

[21]  Gad M. Landau,et al.  Using PQ Trees for Comparative Genomics , 2005, CPM.