Combinatorial Objects in Bio-Algorithmics: Related problems and complexities

The aim of this habilitation is to exhibit my contributions in several area of Bio-Algorithmics. Rather than an exhaustive presentation of my works, I have made the choice of presenting results we obtained with collaborators on a representative subset of the problems I have been involved in since 2005. For ease of readability, I will regroup the results obtained according to the biological problems: i) RNA structures comparison, ii) Genomes comparison and iii) Pattern matching in biological networks and their respective combinatorial objects: i) Arc-annotated sequences, ii) Permutations and Sequences and iii) Graphs. More precisely, The first part will be devoted to the Arc-Annotated Sequences that are used in RNA structure comparison. We will focus on five problems that we investigated: LAPCS, APS, MAPCS, EDIT and ALIGN. In the second part, we will consider the two main research area related to comparative genomics we were involved in: gene clusters detection and (dis)similarity measures computation -- which rely on permutation and string representations. Finally, we will present some results that were obtained mainly during the PhD of Florian Sikora that I co-supervised.

[1]  Jens Stoye,et al.  Character sets of strings , 2007, J. Discrete Algorithms.

[2]  Christian Komusiewicz,et al.  Parameterized Algorithms and Hardness Results for Some Graph Motif Problems , 2008, CPM.

[3]  Florian Sikora,et al.  Aspects algorithmiques de la comparaison d'éléments biologiques. (Algorithmics aspects of biological entities comparison) , 2011 .

[4]  J. Nadeau,et al.  Lengths of chromosomal segments conserved since divergence of man and mouse. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Stéphane Vialette,et al.  On the computational complexity of 2-interval pattern matching problems , 2004, Theor. Comput. Sci..

[6]  Riccardo Dondi,et al.  Maximum Motif Problem in Vertex-Colored Graphs , 2009, CPM.

[7]  Tao Jiang,et al.  Alignment of Trees - An Alternative to Tree Edit , 1994, Theor. Comput. Sci..

[8]  Roded Sharan,et al.  QNet: A Tool for Querying Protein Interaction Networks , 2007, RECOMB.

[9]  Bin Ma,et al.  Edit distance between two RNA structures , 2001, RECOMB.

[10]  Jens Stoye,et al.  On the Similarity of Sets of Permutations and Its Applications to Genome Comparison , 2003, COCOON.

[11]  Bin Ma,et al.  Computing similarity between RNA structures , 1999, Theor. Comput. Sci..

[12]  Sven Rahmann,et al.  Integer Linear Programs for Discovering Approximate Gene Clusters , 2006, WABI.

[13]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[14]  Tao Jiang,et al.  A Parsimony Approach to Genome-Wide Ortholog Assignment , 2006, RECOMB.

[15]  Jens Stoye,et al.  Finding Nested Common Intervals Efficiently , 2010, J. Comput. Biol..

[16]  Guillaume Fertin,et al.  Fixed-parameter algorithms for protein similarity search under mRNA structure constraints , 2005, J. Discrete Algorithms.

[17]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[18]  Stéphan Thomassé A quadratic kernel for feedback vertex set , 2009, SODA.

[19]  Jiong Guo,et al.  Exact Algorithms for the Longest Common Subsequence Problem for Arc-Annotated Sequences , 2002 .

[20]  Jens Stoye,et al.  Common intervals and sorting by reversals: a marriage of necessity , 2002, ECCB.

[21]  Rolf Niedermeier,et al.  Approximation and fixed-parameter algorithms for consecutive ones submatrix problems , 2010, J. Comput. Syst. Sci..

[22]  Guillaume Fertin,et al.  Extending the Hardness of RNA Secondary Structure Comparison , 2007, ESCAPE.

[23]  Paola Bonizzoni,et al.  Complexity Insights of the Minimum Duplication Problem , 2012, SOFSEM.

[24]  Michael R. Fellows,et al.  Sharp Tractability Borderlines for Finding Connected Motifs in Vertex-Colored Graphs , 2007, ICALP.

[25]  D. Bryant The Complexity of Calculating Exemplar Distances , 2000 .

[26]  Guillaume Blin,et al.  GraMoFoNe: a Cytoscape Plugin for Querying Motifs without Topology in Protein-Protein Interactions Networks , 2010, BICoB.

[27]  Tamon Stephen,et al.  Minimal Conflicting Sets for the Consecutive Ones Property in Ancestral Genome Reconstruction , 2010, J. Comput. Biol..

[28]  Jijun Tang,et al.  Phylogenetic Reconstruction from Gene-Rearrangement Data with Unequal Gene Content , 2003, WADS.

[29]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Cedric Chauve,et al.  Genes Order and Phylogenetic Reconstruction: Application to -Proteobacteria , 2005 .

[31]  Y. C. Tay,et al.  Divide-and-conquer approach for the exemplar breakpoint distance , 2005, Bioinform..

[32]  Biing-Feng Wang,et al.  Output-Sensitive Algorithms for Finding the Nested Common Intervals of Two General Sequences , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  Bin Ma,et al.  The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2000, CPM.

[34]  T. Ideker,et al.  Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae , 2006, Journal of biology.

[35]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[36]  Guillaume Fertin,et al.  On the Approximability of Comparing Genomes with Duplicates , 2008, J. Graph Algorithms Appl..

[37]  Guillaume Fertin,et al.  Extracting constrained 2-interval subsets in 2-interval sets , 2007, Theor. Comput. Sci..

[38]  Hans L. Bodlaender,et al.  A Tourist Guide through Treewidth , 1993, Acta Cybern..

[39]  Paola Bonizzoni,et al.  On the parameterized complexity of the repetition free longest common subsequence problem , 2012, Inf. Process. Lett..

[40]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[41]  Nadia El-Mabrouk,et al.  Maximizing Synteny Blocks to Identify Ancestral Homologs , 2005, Comparative Genomics.

[42]  Alexander Zelikovsky,et al.  Bioinformatics Algorithms: Techniques and Applications , 2008 .

[43]  Guillaume Blin,et al.  Querying Graphs in Protein-Protein Interactions Networks Using Feedback Vertex Set , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[44]  Jens Stoye,et al.  Finding All Common Intervals of k Permutations , 2001, CPM.

[45]  Ron Y. Pinter,et al.  Alignment of metabolic pathways , 2005, Bioinform..

[46]  Cristina G. Fernandes,et al.  Motif Search in Graphs: Application to Metabolic Networks , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  Guillaume Fertin,et al.  The ExemplarBreakpointDistancefor Non-trivial Genomes Cannot Be Approximated , 2009, WALCOM.

[48]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[49]  Romeo Rizzi,et al.  A Faster Algorithm for Finding Minimum Tucker Submatrices , 2010, CiE.

[50]  Takeaki Uno,et al.  Fast Algorithms to Enumerate All Common Intervals of Two Permutations , 1997, Algorithmica.

[51]  J. Felsenstein Phylogenies from molecular sequences: inference and reliability. , 1988, Annual review of genetics.

[52]  Rolf Niedermeier,et al.  Compression-based fixed-parameter algorithms for feedback vertex set and edge bipartization , 2006, J. Comput. Syst. Sci..

[53]  Hon Wai Leong,et al.  Gene Team Tree: A Hierarchical Representation of Gene Teams for All Gap Lengths , 2009, J. Comput. Biol..

[54]  Guillaume Fertin,et al.  New Results for the 2-Interval Pattern Problem , 2004, CPM.

[55]  Romeo Rizzi,et al.  Minimum Mosaic Inference of a Set of Recombinants , 2011, Int. J. Found. Comput. Sci..

[56]  Mathieu Raffinot,et al.  The Algorithmic of Gene Teams , 2002, WABI.

[57]  Gad M. Landau,et al.  Gene Proximity Analysis across Whole Genomes via PQ Trees1 , 2005, J. Comput. Biol..

[58]  Sylvain Guillemot,et al.  Finding and Counting Vertex-Colored Subtrees , 2010, Algorithmica.

[59]  Bin Fu,et al.  The Approximability of the Exemplar Breakpoint Distance Problem , 2006, AAIM.

[60]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[61]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[62]  Mathieu Raffinot,et al.  Computing Common Intervals of K Permutations, with Applications to Modular Decomposition of Graphs , 2005, ESA.

[63]  Jens Stoye,et al.  Computation of Median Gene Clusters , 2008, RECOMB.

[64]  Alain Denise,et al.  Using Medians to Generate Consensus Rankings for Biological Data , 2011, SSDBM.

[65]  Xin He,et al.  Identifying Conserved Gene Clusters in the Presence of Homology Families , 2005, J. Comput. Biol..

[66]  Hongwei Wu,et al.  Detecting uber-operons in prokaryotic genomes , 2006, Nucleic acids research.

[67]  Krister M. Swenson,et al.  Approximating the true evolutionary distance between two genomes , 2008, JEAL.

[68]  Srinivas Aluru,et al.  An Algorithmic View on Multi-Related-Segments: A Unifying Model for Approximate Common Interval , 2012, TAMC.

[69]  Zhi-Zhong Chen,et al.  The longest common subsequence problem for sequences with nested arc annotations , 2002, J. Comput. Syst. Sci..

[70]  Stéphane Vialette Pattern Matching over 2-intervals sets , 2002 .

[71]  Mathieu Blanchette,et al.  Gene Maps Linearization Using Genomic Rearrangement Distances , 2007, J. Comput. Biol..

[72]  D. Sankoff,et al.  Gene order comparisons for phylogenetic inference: evolution of the mitochondrial genome. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[73]  Guillaume Blin,et al.  Querying Protein-Protein Interaction Networks , 2009, ISBRA.

[74]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[75]  Guillaume Fertin,et al.  Comparing Genomes with Duplications: A Computational Complexity Point of View , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[76]  Rolf Niedermeier,et al.  Towards Optimally Solving the LONGEST COMMON SUBSEQUENCE Problem for Sequences with Nested Arc Annotations in Linear Time , 2002, CPM.

[77]  Binhai Zhu,et al.  Approximability and Fixed-Parameter Tractability for the Exemplar Genomic Distance Problems , 2009, TAMC.

[78]  Guillaume Fertin,et al.  Algorithmic Aspects of Heterogeneous Biological Networks Comparison , 2011, COCOA.

[79]  Guillaume Fertin,et al.  A Pseudo-boolean Programming Approach for Computing the Breakpoint Distance Between Two Genomes with Duplicate Genes , 2007, RECOMB-CG.

[80]  J. Risler,et al.  Identification of genomic features using microsyntenies of domains: domain teams. , 2005, Genome research.

[81]  Guillaume Blin,et al.  Comparing RNA Structures with Biologically Relevant Operations Cannot Be Done without Strong Combinatorial Restrictions , 2010, WALCOM.

[82]  Rolf Niedermeier,et al.  Pattern matching for arc-annotated sequences , 2006, TALG.

[83]  Krister M. Swenson,et al.  A 2-Approximation for the Minimum Duplication Speciation Problem , 2011, J. Comput. Biol..

[84]  David Sankoff,et al.  Genome rearrangement with gene families , 1999, Bioinform..

[85]  Guillaume Fertin,et al.  What Makes the Arc-Preserving Subsequence Problem Hard? , 2005, Trans. Comp. Sys. Biology.

[86]  G. Blin,et al.  The breakpoint distance for signed sequences , 2005 .

[87]  Klaas Vandepoele,et al.  Recent developments in computational approaches for uncovering genomic homology. , 2004, BioEssays : news and reviews in molecular, cellular and developmental biology.

[88]  Patricia A. Evans Finding Common Subsequences with Arcs and Pseudoknots , 1999, CPM.

[89]  David Sankoff,et al.  Power Boosts for Cluster Tests , 2005, Comparative Genomics.

[90]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[91]  Roded Sharan,et al.  Topology-Free Querying of Protein Interaction Networks , 2009, RECOMB.

[92]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[93]  Guillaume Fertin,et al.  Comparing RNA Structures: Towards an Intermediate Model Between the Editand the LapcsProblems , 2007, BSB.

[94]  Dannie Durand,et al.  The Incompatible Desiderata of Gene Cluster Properties , 2005, Comparative Genomics.

[95]  Albert Y. Zomaya,et al.  Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications , 2011 .

[96]  Romeo Rizzi,et al.  Conserved Interval Distance Computation Between Non-trivial Genomes , 2005, COCOON.

[97]  D. Sankoff,et al.  Comparative Genomics: "Empirical And Analytical Approaches To Gene Order Dynamics, Map Alignment And The Evolution Of Gene Families" , 2000 .

[98]  Krister M. Swenson,et al.  Genomic Distances under Deletions and Insertions , 2003, COCOON.

[99]  Cédric Chauve,et al.  An Edit Distance Between RNA Stem-Loops , 2005, SPIRE.

[100]  Maxime Crochemore,et al.  Finding the median of three permutations under the Kendall-tau distance , 2009 .

[101]  Marie-France Sagot,et al.  Assessing the Exceptionality of Coloured Motifs in Networks , 2008, EURASIP J. Bioinform. Syst. Biol..

[102]  Romeo Rizzi,et al.  A Faster Algorithm for Finding Minimum Tucker Submatrices , 2012, Theory of Computing Systems.

[103]  Bin Fu,et al.  Lower Bounds on the Approximation of the Exemplar Conserved Interval Distance Problem of Genomes , 2006, COCOON.

[104]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[105]  Xin Chen,et al.  Assignment of orthologous genes via genome rearrangement , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[106]  Mam Riess Jones Color Coding , 1962, Human factors.

[107]  Geevarghese Philip,et al.  On the Kernelization Complexity of Colorful Motifs , 2010, IPEC.

[108]  Hélène Touzet,et al.  How to Compare Arc-Annotated Sequences: The Alignment Hierarchy , 2006, SPIRE.

[109]  Mathieu Blanchette,et al.  Inferring Gene Orders from Gene Maps Using the Breakpoint Distance , 2006, Comparative Genomics.

[110]  Katharina Jahn Approximate common intervals based gene cluster models , 2011 .

[111]  Amihood Amir,et al.  Improved approximate common interval , 2007, Inf. Process. Lett..

[112]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[113]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[114]  Roded Sharan,et al.  QPath: a method for querying pathways in a protein-protein interaction network , 2006, BMC Bioinformatics.

[115]  Jens Stoye,et al.  Quadratic Time Algorithms for Finding Common Intervals in Two and More Sequences , 2004, CPM.

[116]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.