Comparing Integer Linear Programming to SAT-Solving for Hard Problems in Computational and Systems Biology

It is useful to have general-purpose solution methods that can be applied to a wide range of problems, rather than relying on the development of clever, intricate algorithms for each specific problem. Integer Linear Programming is the most widely-used such general-purpose solution method. It is successful in a wide range of problems. However, there are some problems in computational biology where integer linear programming has had only limited success. In this paper, we explore an alternate, general-purpose solution method: SAT-solving, i.e., constructing Boolean formulas in conjunctive normal form (CNF) that encode a problem instance, and using a SAT-solver to determine if the CNF formula is satisfiable or not. In three hard problems examined, we were very surprised to find the SAT-solving approach was dramatically better than the ILP approach in two problems; and a little slower, but more robust, in the third problem. We also re-examined and confirmed an earlier result on a fourth problem, using current ILP and SAT-solvers. These results should encourage further efforts to exploit SAT-solving in computational biology.

[1]  Bernard M. E. Moret,et al.  Comparing genomes with rearrangements and segmental duplications , 2015, Bioinform..

[2]  Olivier Bailleux,et al.  Efficient CNF Encoding of Boolean Cardinality Constraints , 2003, CP.

[3]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[4]  Dan Gusfield,et al.  ReCombinatorics: The Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks , 2014 .

[5]  Anna Gavling,et al.  The ART at , 2008 .

[6]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[7]  Jeffrey D. Palmer,et al.  Plant mitochondrial DNA evolved rapidly in structure, but slowly in sequence , 2005, Journal of Molecular Evolution.

[8]  Faraz Hach,et al.  PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data , 2019, Genome Research.

[9]  Inês Lynce,et al.  Efficient Haplotype Inference with Boolean Satisfiability , 2006, AAAI.

[10]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[11]  Regina Berretta,et al.  An integer programming model for protein structure prediction using the 3D-HP side chain model , 2016, Discret. Appl. Math..

[12]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[13]  Roded Sharan,et al.  Genome Rearrangement with ILP , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  D. Gusfield Integer Linear Programming in Computational and Systems Biology , 2019 .

[15]  Daniel G. Brown,et al.  Integer programming approaches to haplotype inference by pure parsimony , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Giuseppe Lancia,et al.  A Unified Integer Programming Model for Genome Rearrangement Problems , 2015, IWBBIO.

[17]  R. Griffiths,et al.  Bounds on the minimum number of recombination events in a sample history. , 2003, Genetics.

[18]  Steven Kelk,et al.  A Resolution of the Static Formulation Question for the Problem of Computing the History Bound , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Dan Gusfield,et al.  Haplotype Inference by Pure Parsimony , 2003, CPM.

[20]  Vineet Bafna,et al.  Inference about Recombination from Haplotype Data: Lower Bounds and Recombination Hotspots , 2006, J. Comput. Biol..

[21]  Inês Lynce,et al.  SAT in Bioinformatics: Making the Case with Haplotype Inference , 2006, SAT.

[22]  Giuseppe Lancia,et al.  Haplotyping Populations by Pure Parsimony: Complexity of Exact and Approximation Algorithms , 2004, INFORMS J. Comput..