Biological pathway mapping is an important problem in the post-genomic era. We now present a new algorithm for pathway mapping in microbes. The algorithm considers not only sequence similarity among the template and target genes, but also the operon structures in the target genome. We formulated the mapping problem as a graph finding problem, and solved it by an integer-programming (IP) method. The goal is to minimize a linear object function subject to six constraints, such that maximal sequence similarity among the template and target genes are achieved, and at the same time, a minimal number of operons are covered in the target genome. Compared to our previous minimal spanning tree (MST) algorithm, the IP method has the following advantages: i) It is much faster and thus can map larger pathway involving a much large set of genes. ii) The IP method looks into the details of genes in the operons, and consequently avoids the many-to-one mapping mistakes that sometimes occur in the MST algorithm. We have compiled a large pathway training set to optimize the parameters of the program, and tested it by mapping 16 complex pathways from BioCyc onto E.coli K12 genome and the results are very promising.
[1]
Bengt Sennblad,et al.
Bayesian gene/species tree reconciliation and orthology analysis using MCMC
,
2003,
ISMB.
[2]
R. Lougee-Heimer,et al.
The Common Optimization INterface for Operations Research: Promoting open-source software in the operations research community
,
2003
.
[3]
D. Lipman,et al.
A genomic perspective on protein families.
,
1997,
Science.
[4]
D. P. Wall,et al.
Detecting putative orthologs
,
2003,
Bioinform..
[5]
T. Traut,et al.
A minimal gene set for cellular life derived by comparison of complete bacterial genomes
,
1998
.