On the Multiple Gene Duplication Problem

A fundamental problem in computational biology is the determination of the correct species tree for a set of taxa given a set of (possibly contradictory) gene trees. In recent literature, the DUPLICATION/ LOSS model has received considerable attention. Here one measures the similarity/dissimilarity between a set of gene trees by counting the number of paralogous gene duplications and subsequent gene losses which need to be postulated in order to explain (in an evolutionarily meaningful way) how the gene trees could have arisen with respect to the species tree. Here we count the number of multiple gene duplication events (duplication events in the genome of the organism involving one or more genes) without regard to gene losses. MULTIPLE GENE DUPLICATION asks to find the species tree S which requires the fewest number of multiple gene duplication events to be postulated in order to explain a set of gene trees G1, G2,..., Gk. We also examine the related problem which assumes the species tree S is known and asks to find the explanation for G1, G2,..., Gk requiring the fewest multiple gene duplications. Via a reduction to and from a combinatorial model we call the BALL AND TRAP GAME, we show that the general form of this problem is NP-hard and various parameterized versions are hard for the complexity class W[1]. These results immediately imply that MULTIPLE GENE DUPLICATION is similarily hard. We prove that several parameterized variants are in FPT.

[1]  Michael R. Fellows,et al.  An Improved Fixed-Parameter Algorithm for Vertex Cover , 1998, Inf. Process. Lett..

[2]  G. Moore,et al.  Fitting the gene lineage into its species lineage , 1979 .

[3]  Louxin Zhang,et al.  On a Mirkin-Muchnik-Smith Conjecture for Comparing Molecular Phylogenies , 1997, J. Comput. Biol..

[4]  John C. Avise,et al.  PHYLOGENETIC RELATIONSHIPS OF MITOCHONDRIAL DNA UNDER VARIOUS DEMOGRAPHIC MODELS OF SPECIATION , 1986 .

[5]  M. A. Soto,et al.  On the Construction of a Phylogenetic Tree , 1979 .

[6]  G DowneyRod,et al.  Fixed-Parameter Tractability and Completeness I , 1995 .

[7]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[8]  Temple F. Smith,et al.  Reconstruction of ancient molecular phylogeny. , 1996, Molecular phylogenetics and evolution.

[9]  R. Page,et al.  From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem. , 1997, Molecular phylogenetics and evolution.

[10]  R. Downey,et al.  Parameterized Computational Feasibility , 1995 .

[11]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[12]  Liming Cai,et al.  On the parameterized complexity of short computation and factorization , 1997, Arch. Math. Log..

[13]  R. Page Maps between trees and cladistic analysis of historical associations among genes , 1994 .

[14]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[15]  Bin Ma,et al.  On reconstructing species trees from gene trees in term of duplications and losses , 1998, RECOMB '98.

[16]  Steven A. Benner,et al.  Evolution and Structural Theory: The Frontier Between Chemistry and Biology , 1990 .

[17]  Michael R. Fellows,et al.  Fixed-Parameter Tractability and Completeness II: On Completeness for W[1] , 1995, Theor. Comput. Sci..

[18]  Michael R. Fellows,et al.  Analogs and Duals of the MAST Problem for Sequences and Trees , 1998, ESA.

[19]  Michael R. Fellows,et al.  Finite automata, bounded treewidth, and well-quasiordering , 1991, Graph Structure Theory.