Interval scheduling maximizing minimum coverage

In the classical interval scheduling type of problems, a set of $n$ jobs, characterized by their start and end time, need to be executed by a set of machines, under various constraints. In this paper we study a new variant in which the jobs need to be assigned to at most $k$ identical machines, such that the minimum number of machines that are busy at the same time is maximized. This is relevant in the context of genome sequencing and haplotyping, specifically when a set of DNA reads aligned to a genome needs to be pruned so that no more than $k$ reads overlap, while maintaining as much read coverage as possible across the entire genome. We show that the problem can be solved in time $\min\left(O(n^2\log k / \log n),O(nk\log k)\right)$ by using max-flows. We also give an $O(n\log n)$-time approximation algorithm with approximation ratio $\rho =\frac{k}{\lfloor k/2 \rfloor}$.

[1]  Leo van Iersel,et al.  On the Complexity of Several Haplotyping Problems , 2005, WABI.

[2]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[3]  Hamilton Emmons,et al.  Interval Scheduling on identical machines , 1996, J. Glob. Optim..

[4]  Esther M. Arkin,et al.  Scheduling jobs with fixed start and end times , 1987, Discret. Appl. Math..

[5]  D. R. Fulkerson,et al.  Maximal Flow Through a Network , 1956 .

[6]  Russell Schwartz,et al.  Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem , 2002, Briefings Bioinform..

[7]  Peter Brucker,et al.  Thek-track assignment problem , 1994, Computing.

[8]  Alexandru I. Tomescu,et al.  Genome-Scale Algorithm Design: Genomics , 2015 .

[9]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[10]  T. C. Edwin Cheng,et al.  Fixed interval scheduling: Models, applications, computational complexity and algorithms , 2007, Eur. J. Oper. Res..

[11]  Frits C. R. Spieksma,et al.  Interval scheduling: A survey , 2007 .

[12]  Manuel Holtgrewe,et al.  Mason – A Read Simulator for Second Generation Sequencing Data , 2010 .

[13]  Leo van Iersel,et al.  WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads , 2014, RECOMB.

[14]  James B. Orlin,et al.  Max flows in O(nm) time, or better , 2013, STOC '13.

[15]  Joseph Y.-T. Leung,et al.  An Optimal Solution for the Channel-Assignment Problem , 1979, IEEE Transactions on Computers.

[16]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .