Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT

BackgroundHaplotype inference based on unphased SNP markers is an important task in population genetics. Although there are different approaches to the inference of haplotypes in diploid species, the existing software is not suitable for inferring haplotypes from unphased SNP data in polyploid species, such as the cultivated potato (Solanum tuberosum). Potato species are tetraploid and highly heterozygous.ResultsHere we present the software SATlotyper which is able to handle polyploid and polyallelic data. SATlo-typer uses the Boolean satisfiability problem to formulate Haplotype Inference by Pure Parsimony. The software excludes existing haplotype inferences, thus allowing for calculation of alternative inferences. As it is not known which of the multiple haplotype inferences are best supported by the given unphased data set, we use a bootstrapping procedure that allows for scoring of alternative inferences. Finally, by means of the bootstrapping scores, it is possible to optimise the phased genotypes belonging to a given haplotype inference. The program is evaluated with simulated and experimental SNP data generated for heterozygous tetraploid populations of potato. We show that, instead of taking the first haplotype inference reported by the program, we can significantly improve the quality of the final result by applying additional methods that include scoring of the alternative haplotype inferences and genotype optimisation. For a sub-population of nineteen individuals, the predicted results computed by SATlotyper were directly compared with results obtained by experimental haplotype inference via sequencing of cloned amplicons. Prediction and experiment gave similar results regarding the inferred haplotypes and phased genotypes.ConclusionOur results suggest that Haplotype Inference by Pure Parsimony can be solved efficiently by the SAT approach, even for data sets of unphased SNP from heterozygous polyploids. SATlotyper is freeware and is distributed as a Java JAR file. The software can be downloaded from the webpage of the GABI Primary Database at http://www.gabipd.org/projects/satlotyper/. The application of SATlotyper will provide haplotype information, which can be used in haplotype association mapping studies of polyploid plants.

[1]  Daniel G. Brown,et al.  A New Integer Programming Formulation for the Pure Parsimony Problem in Haplotype Analysis , 2004, WABI.

[2]  Inês Lynce,et al.  Efficient Haplotype Inference with Pseudo-boolean Optimization , 2007, AB.

[3]  P. Oefner,et al.  First-generation SNP/InDel markers tagging loci for pathogen resistance in the potato genome. , 2003, Plant biotechnology journal.

[4]  Dan Gusfield,et al.  Haplotype Inference by Pure Parsimony , 2003, CPM.

[5]  Inês Lynce,et al.  Efficient and Tight Upper Bounds for Haplotype Inference by Pure Parsimony Using Delayed Haplotype Selection , 2007, EPIA Workshops.

[6]  Axel Nagel,et al.  PoMaMo—a comprehensive database for potato genome data , 2004, Nucleic Acids Res..

[7]  Jiming Jiang,et al.  The R1 resistance gene cluster contains three groups of independently evolving, type I R1 homologues and shows substantial structural variation among haplotypes of Solanum demissum. , 2005, The Plant journal : for cell and molecular biology.

[8]  Jennifer Wessel,et al.  A comprehensive literature review of haplotyping software and methods for use with unrelated individuals , 2005, Human Genomics.

[9]  E. Balázs,et al.  From Crop Domestication to Super-domestication , 2007, Annals of botany.

[10]  Inês Lynce,et al.  Breaking Symmetries in SAT Matrix Models , 2007, SAT.

[11]  Inês Lynce,et al.  Efficient Haplotype Inference with Boolean Satisfiability , 2006, AAAI.

[12]  Niklas Sörensson,et al.  Translating Pseudo-Boolean Constraints into SAT , 2006, J. Satisf. Boolean Model. Comput..

[13]  G. S. Tseitin On the Complexity of Derivation in Propositional Calculus , 1983 .

[14]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[15]  Ivan Simko One potato, two potato: haplotype association mapping in autotetraploids. , 2004, Trends in plant science.

[16]  M. Xiong,et al.  Haplotypes vs single marker linkage disequilibrium tests: what do we gain? , 2001, European Journal of Human Genetics.

[17]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[18]  C. Gebhardt,et al.  Single nucleotide polymorphism (SNP) genotyping as basis for developing a PCR-based marker highly diagnostic for potato varieties with high resistance to Globodera pallida pathotype Pa2/3 , 2006, Molecular Breeding.

[19]  Inês Lynce,et al.  Efficient Haplotype Inference with Combined CP and OR Techniques , 2008, CPAIOR.

[20]  Bernd Becker,et al.  Multithreaded SAT Solving , 2007, 2007 Asia and South Pacific Design Automation Conference.

[21]  Rémy Bruggmann,et al.  Comparative sequence analysis of Solanum and Arabidopsis in a hot spot for pathogen resistance on potato chromosome V reveals a patchwork of conserved and rapidly evolving genome segments , 2007, BMC Genomics.

[22]  Mark Jung,et al.  SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines , 2002, BMC Genetics.

[23]  Daniel G. Brown,et al.  Integer programming approaches to haplotype inference by pure parsimony , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.