The minimum distance superset problem: formulations and algorithms

The partial digest problem consists in retrieving the positions of a set of points on the real line from their unlabeled pairwise distances. This problem is critical for DNA sequencing, as well as for phase retrieval in X-ray crystallography. When some of the distances are missing, this problem generalizes into a “minimum distance superset problem”, which aims to find a set of points of minimum cardinality such that the multiset of their pairwise distances is a superset of the input. We introduce a quadratic integer programming formulation for the minimum distance superset problem with a pseudo-polynomial number of variables, as well as a polynomial-size integer programming formulation. We investigate three types of solution approaches based on an available integer programming solver: (1) solving a linearization of the pseudo-polynomial-sized formulation, (2) solving the complete polynomial-sized formulation, or (3) performing a binary search over the number of points and solving a simpler feasibility or optimization problem at each step. As illustrated by our computational experiments, the polynomial formulation with binary search leads to the most promising results, allowing to optimally solve most instances with up to 25 distance values and 8 solution points.

[1]  Noisy data make the partial digest problem , 2022 .

[2]  Alain Billionnet,et al.  Using a Mixed Integer Quadratic Programming Solver for the Unconstrained Quadratic 0-1 Problem , 2007, Math. Program..

[3]  Steven Skiena,et al.  Reconstructing sets from interpoint distances (extended abstract) , 1990, SCG '90.

[4]  Arvind Gupta,et al.  On the turnpike problem , 2000 .

[5]  A. L. Patterson Ambiguities in the X-Ray Analysis of Crystal Structures , 1944 .

[6]  Paolo Penna,et al.  Partial Digest is hard to solve for erroneous input data , 2005, Theor. Comput. Sci..

[7]  Maurice Nivat,et al.  The chords' problem , 2002, Theor. Comput. Sci..

[8]  Zheng Zhang An Exponential Example for a Partial Digest Mapping Algorithm , 1994, J. Comput. Biol..

[9]  A. L. Patterson A Direct Method for the Determination of the Components of Interatomic Distances in Crystals , 1935 .

[10]  Paolo Penna,et al.  Noisy Data Make the Partial Digest Problem NP-hard , 2003, WABI.

[11]  Maurice Nivat,et al.  Some necessary clarifications about the chords' problem and the Partial Digest Problem , 2005, Theor. Comput. Sci..

[12]  Steven Skiena,et al.  A partial digest approach to restriction site mapping , 1993, ISMB.

[13]  Warren D. Smith,et al.  Reconstructing Sets From Interpoint Distances , 2003 .

[14]  Warren P. Adams,et al.  A Reformulation-Linearization Technique for Solving Discrete and Continuous Nonconvex Problems , 1998 .

[15]  Kurt M. Anstreicher,et al.  Institute for Mathematical Physics Semidefinite Programming versus the Reformulation–linearization Technique for Nonconvex Quadratically Constrained Quadratic Programming Semidefinite Programming versus the Reformulation-linearization Technique for Nonconvex Quadratically Constrained , 2022 .

[16]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .