Efficient Methods for Determining DNA Probe Orders

In STS-based mapping, it is necessary to obtain the correct order of probes in a DNA sequence from a given set of fragments or an equivalently a hybridization matrix A. It is well-known that the problem is formulated as the combinatorial problem of obtaining a permutation of A's columns so that the resulting matrix has a consecutive-one property. If the data (the hybridization matrix) is error free and includes enough information, then the above column order uniquely determines the correct order of the probes. Unfortunately this does not hold if the data include errors, and this has been a popular research target in computational biology. Even if there is no error, ambiguities in the probe order may still remain. This in fact happens because of the lack of some information regarding the data, but almost no further investigation has previously been made. In this paper, we define a measure of such imperfectness of the data as the minimum amount of the additional fragments that are needed to uniquely fix the probe order. Polynomial-time algorithms to compute such additional fragments of the minimum cost are presented. A computer simulation using genes of human chromosome 20 is also noted.