Fixed-parameter algorithms for scaffold filling

In this paper we consider two combinatorial problems related to genome comparison. The two problems, starting from possibly incomplete genomes produced from sequencing data, aim to reconstruct the complete genomes by inserting a collection of missing genes. More precisely, in the first problem, called One-sided scaffold filling, we are given an incomplete genome \(B\) and a complete genome \(A\), and we look for the insertion of missing genes into \(B\) with the goal of maximizing the common adjacencies between the resulting genome \(B'\) and \(A\). In the second problem, called Two-sided scaffold filling, we are given two incomplete genomes \(A\), \(B\), and we look for the insertion of missing genes into both genomes so that the resulting genomes \(A'\) and \(B'\) have the same multi-set of genes, with the goal of maximizing the common adjacencies between \(A'\) and \(B'\). While both problems are known to be NP-hard, their parameterized complexity when parameterized by the number of common adjacencies of the resulting genomes is still open. In this paper, we settle this open problem and we present fixed-parameter algorithms for the One-sided scaffold filling problem and the Two-sided scaffold filling problem.

[1]  Paola Bonizzoni,et al.  Variants of constrained longest common subsequence , 2009, Inf. Process. Lett..

[2]  David Sankoff,et al.  Scaffold Filling under the Breakpoint and Related Distances , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Paola Bonizzoni,et al.  Restricted and Swap Common Superstring: A Multivariate Algorithmic Perspective , 2014, Algorithmica.

[4]  Bin Fu,et al.  On the approximability of the exemplar adjacency number problem for genomes with gene repetitions , 2014, Theor. Comput. Sci..

[5]  Riccardo Dondi,et al.  Finding approximate and constrained motifs in graphs , 2013, Theor. Comput. Sci..

[6]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[7]  R. Quatrano Genomics , 1998, Plant Cell.

[8]  Christian Komusiewicz,et al.  Parameterized Algorithmics for Finding Connected Motifs in Biological Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Paola Bonizzoni,et al.  On the parameterized complexity of the repetition free longest common subsequence problem , 2012, Inf. Process. Lett..

[10]  Andreas Björklund,et al.  Probably Optimal Graph Motifs , 2013, STACS.

[11]  Michael R. Fellows,et al.  Upper and lower bounds for finding connected motifs in vertex-colored graphs , 2011, J. Comput. Syst. Sci..

[12]  Rolf Niedermeier,et al.  Invitation to Fixed-Parameter Algorithms , 2006 .

[13]  Nan Liu,et al.  The Algorithm for the Two-Sided Scaffold Filling Problem , 2013, TAMC.

[14]  Bin Fu,et al.  Non-breaking Similarity of Genomes with Gene Repetitions , 2007, CPM.

[15]  David Sankoff,et al.  Scaffold filling, contig fusion and comparative gene order inference , 2010, BMC Bioinformatics.

[16]  Riccardo Dondi,et al.  Complexity issues in vertex-colored graph pattern matching , 2011, J. Discrete Algorithms.

[17]  David Sankoff,et al.  Scaffold Filling under the Breakpoint Distance , 2010, RECOMB-CG.

[18]  Noga Alon,et al.  Color-coding , 1995, JACM.

[19]  Binhai Zhu,et al.  An Improved Approximation Algorithm for Scaffold Filling to Maximize the Common Adjacencies , 2013, TCBB.

[20]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[21]  Guillaume Fertin,et al.  Combinatorics of Genome Rearrangements , 2009, Computational molecular biology.

[22]  Sylvain Guillemot,et al.  Finding and Counting Vertex-Colored Subtrees , 2010, Algorithmica.

[23]  B. Birren,et al.  Genome Project Standards in a New Era of Sequencing , 2009, Science.

[24]  F. Delsuc Comparative Genomics , 2010, Lecture Notes in Computer Science.

[25]  Richard Friedberg,et al.  Efficient sorting of genomic permutations by translocation, inversion and block interchange , 2005, Bioinform..