Complexity and approximation of the minimum recombinant haplotype configuration problem

We study the complexity and approximation of the problem of reconstructing haplotypes from genotypes on pedigrees under the Mendelian Law of Inheritance and the minimum recombinant principle (MRHC). First, we show that the MRHC for simple pedigrees where each member has at most one mate and at most one child (i.e. binary-tree pedigrees) is NP-hard. Second, we present some approximation results for the MRHC problem, which are the first approximation results in the literature to the best of our knowledge. We prove that the MRHC on two-locus pedigrees or binary-tree pedigrees with missing data cannot be approximated unless P=NP. Next we show that the MRHC on two-locus pedigrees without missing data cannot be approximated within any constant ratio under the Unique Games Conjecture and can be approximated within the ratio O(log(n)). Our L-reduction for the approximation hardness gives a simple alternative proof that the MRHC on two-locus pedigrees is NP-hard, which is much easier to understand than the original proof. We also show that the MRHC for tree pedigrees without missing data cannot be approximated within any constant ratio under the Unique Games Conjecture, too. Finally, we explore the hardness and approximation of the MRHC on pedigrees where each member has a bounded number of children and mates mirroring real pedigrees.

[1]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[2]  Giorgio Gambosi,et al.  Complexity and Approximation , 1999, Springer Berlin Heidelberg.

[3]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[4]  Luca Aceto,et al.  The complexity of checking consistency of pedigree information and related problems , 2008, Journal of Computer Science and Technology.

[5]  P. Tam The International HapMap Consortium. The International HapMap Project (Co-PI of Hong Kong Centre which responsible for 2.5% of genome) , 2003 .

[6]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[7]  Lynn B Jorde Evolution. Where we're hot, they're not. , 2005, Science.

[8]  Tao Jiang,et al.  Efficient rule-based haplotyping algorithms for pedigree data , 2003, RECOMB '03.

[9]  Subhash Khot On the power of unique 2-prover 1-round games , 2002, STOC '02.

[10]  Jianping Dong,et al.  Transmission/disequilibrium test based on haplotype sharing for tightly linked markers. , 2003, American journal of human genetics.

[11]  K. Roeder,et al.  Transmission/disequilibrium test meets measured haplotype analysis: family-based association analysis guided by evolution of haplotypes. , 2001, American journal of human genetics.

[12]  Richard M. Karp,et al.  Large scale reconstruction of haplotypes from genotype data , 2003, RECOMB '03.

[13]  Amit Agarwal,et al.  O(√log n) approximation algorithms for min UnCut, min 2CNF deletion, and directed cut problems , 2005, STOC '05.

[14]  Jong Hyun Kim,et al.  Haplotype Reconstruction from SNP Alignment , 2004, J. Comput. Biol..

[15]  J. O’Connell Zero‐recombinant haplotyping: Applications to fine mapping using SNPs , 2000, Genetic epidemiology.

[16]  Tao Jiang,et al.  Minimum Recombinant Haplotype Configuration on Tree Pedigrees ( Extended Abstract ) , 2003 .

[17]  Lynn B. Jorde,et al.  Where We're Hot, They're Not , 2005, Science.

[18]  Tao Jiang,et al.  An exact solution for finding minimum recombinant haplotype configurations on pedigrees with missing data by integer linear programming , 2004, RECOMB.

[19]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[20]  R. Steele Optimization , 2005 .

[21]  Russell Schwartz,et al.  Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem , 2002, Briefings Bioinform..

[22]  D. Qian,et al.  Minimum-recombinant haplotyping in pedigrees. , 2002, American journal of human genetics.

[23]  Tao Jiang,et al.  Computing the Minimum Recombinant Haplotype Configuration from Incomplete Genotype Data on a Pedigree by Integer Linear Programming , 2005, J. Comput. Biol..

[24]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[25]  Thomas J. Schaefer,et al.  The complexity of satisfiability problems , 1978, STOC.