Pathway mapping with operon information: an integer-programming method

Biological pathway mapping is an important problem in the post-genomic era. We now present a new algorithm for pathway mapping in microbes. The algorithm considers not only sequence similarity among the template and target genes, but also the operon structures in the target genome. We formulated the mapping problem as a graph finding problem, and solved it by an integer-programming (IP) method. The goal is to minimize a linear object function subject to six constraints, such that maximal sequence similarity among the template and target genes are achieved, and at the same time, a minimal number of operons are covered in the target genome. Compared to our previous minimal spanning tree (MST) algorithm, the IP method has the following advantages: i) It is much faster and thus can map larger pathway involving a much large set of genes. ii) The IP method looks into the details of genes in the operons, and consequently avoids the many-to-one mapping mistakes that sometimes occur in the MST algorithm. We have compiled a large pathway training set to optimize the parameters of the program, and tested it by mapping 16 complex pathways from BioCyc onto E.coli K12 genome and the results are very promising.