HapColor: A graph coloring framework for polyploidy phasing

Polyploidy, the presence of more than two copies of each chromosome in the cells of an organism, is common in plants and animals, and finds important applications in the field of genetics. To understand structure of each chromosome using Next Generation Sequencing (NGS), haplotype assembly is needed.We propose HapColor, a fragment partitioning approach, based on a new conflict graph model. We introduce a graph coloring algorithm followed by a color merging method to accurately group DNA short reads into any number of partitions depending on the ploidy level of the organism from which the sequencing data are derived. We compare HapColor with HapTree (a recently introduced polyploidy haplotyping), PGreedy (a polyploidy haplotyping that we develop based on Levy's well-known greedy algorithm) and RFP (a baseline random fragment partitioning method). Our analysis on Triploid, Tetraploid, Hexaploid, and Decaploid datasets demonstrate that HapColor substantially improves haplotype assembly accuracy of the other algorithms. The amount of improvement ranges from 25% to 90% depending on the ploidy level.