An Efficient Algorithm for the Contig Ordering Problem under Algebraic Rearrangement Distance

Assembling a genome from short reads currently obtained by next-generation sequencing techniques often results in a collection of contigs, whose relative position and orientation along the genome being sequenced are unknown. Given two sets of contigs, the contig ordering problem is to order and orient the contigs in each set such that the genome rearrangement distance between the resulting sets of ordered and oriented contigs is minimized. In this article, we utilize the permutation groups in algebra to propose a near-linear time algorithm for solving the contig ordering problem under algebraic rearrangement distance, where the algebraic rearrangement distance between two sets of ordered and oriented contigs is the minimum weight of applicable rearrangement operations required to transform one set into the other.