In this paper, we describe a method for statistical reconstruction of haplotypes from a set of aligned SNP fragments. We consider the case of a pair of homologous human chromosomes, one from the mother and the other from the father. After fragment assembly, we wish to reconstruct the two haplotypes of the parents. Given a set of potential SNP sites inferred from the assembly alignment, we wish to divide the fragment set into two subsets, each of which represents one chromosome. Our method is based on a statistical model of sequencing errors, compositional information, and haplotype memberships. We calculate probabilities of different haplotypes conditional on the alignment. Due to computational complexity, we first determine phases for neighboring SNPs. Then we connect them and construct haplotype segments. Also, we compute the accuracy or confidence of the reconstructed haplotypes. We discuss other issues, such as alternative methods, parameter estimation, computational efficiency, and relaxation of assumptions.
[1]
A. Cornish-Bowden.
Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984.
,
1985,
Nucleic acids research.
[2]
D. Nickerson,et al.
PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing.
,
1997,
Nucleic acids research.
[3]
P Green,et al.
Base-calling of automated sequencer traces using phred. II. Error probabilities.
,
1998,
Genome research.
[4]
M. Waterman,et al.
The accuracy of DNA sequences: estimating sequence quality.
,
1992,
Genomics.
[5]
Timothy B. Stockwell,et al.
The Sequence of the Human Genome
,
2001,
Science.
[6]
Christopher J. Lee,et al.
Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences
,
2000,
Nature Genetics.