An Improved Approximation Algorithm for Scaffold Filling to Maximize the Common Adjacencies

Scaffold filling is a new combinatorial optimization problem in genome sequencing. The one-sided scaffold filling problem can be described as given an incomplete genome $(I)$ and a complete (reference) genome $(G)$, fill the missing genes into $(I)$ such that the number of common (string) adjacencies between the resulting genome $(I^{\prime })$ and $(G)$ is maximized. This problem is NP-complete for genome with duplicated genes and the best known approximation factor is 1.33, which uses a greedy strategy. In this paper, we prove a better lower bound of the optimal solution, and devise a new algorithm by exploiting the maximum matching method and a local improvement technique, which improves the approximation factor to 1.25. For genome with gene repetitions, this is the only known NP-complete problem which admits an approximation with a small constant factor (less than 1.5).