Parameterized Algorithmics for Network Analysis: Clustering & Querying

Preface This thesis summarizes some of my results on NP-hard graph problems that have applications in the areas of network clustering and querying. The research for obtaining these results was Forschungsgemeinschaft (DFG), as a researcher in the DFG project " Parameterized Algorithmics for Bioinformatics " (PABI, NI 369/7). I want to express my gratitude to Rolf Niedermeier for giving me the opportunity to work in his group and for his advice and support that eventually led to this thesis. Furthermore, I want to thank my colleagues and former colleagues for creating an enjoyable working atmosphere and for many inspiring and instructive discussions. Moreover, I owe sincere thanks to my coauthors for the pleasant and productive cooperation. In particular, I would like to thank René van Bevern for his implementation of the algorithms in Chapter 6. Finally, I want to thank the anonymous referees of several journals and scientific conferences for many pieces of advice that helped improving this work. The results in this thesis are partially contained in journal and conference publications that were created in close collaboration with coauthors. Below, I will describe which publications contributed to which chapters, and I will also specify my contributions to these publications. Further work to which I have contributed but that is not part of this thesis is concerned with parameterized algorithmics for graph modification problems [van Bevern et al. The latter collection of publications deals with extensions of the classical clustering notion, for example with hierarchical clusterings. In this work, the focus is on " classical " clusterings, that is, partitions of a set of objects. iii iv Preface Part II: Clustering. Chapter 2 is based on parts of the publication " Alternative Parameterizations for Cluster Editing " , which appeared in the proceedings of the 37th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM '11) [Komusiewicz and Uhlmann 2011]; a full version of this publication is in preparation. I proposed to study the parameter " local modification bound " , participated in the development of the reduction from 3-SAT to CLUSTER EDITING, and observed the connections between CLUSTER DELETION and PARTITION INTO TRIANGLES. Furthermore, I developed the kernelization algorithms. Chapter 3 is based on the two publications " A More Relaxed Model for Graph-Based Data Clustering: s-Plex Cluster Editing " , which appeared in SIAM Journal on Discrete Mathematics [Guo et al. 2010b], and " …

[1]  Hans L. Bodlaender,et al.  Partition into Triangles on Bounded Degree Graphs , 2011, SOFSEM.

[2]  Christian Komusiewicz,et al.  A More Relaxed Model for Graph-Based Data Clustering: s-Plex Editing , 2009, AAIM.

[3]  Hans L. Bodlaender,et al.  On Linear Time Minor Tests with Depth-First Search , 1993, J. Algorithms.

[4]  Christian Komusiewicz,et al.  Deconstructing intractability - A multivariate complexity analysis of interval constrained coloring , 2011, J. Discrete Algorithms.

[5]  Rolf Niedermeier,et al.  Invitation to data reduction and problem kernelization , 2007, SIGA.

[6]  David L. Hicks,et al.  Notice of Violation of IEEE Publication PrinciplesDetecting Critical Regions in Covert Networks: A Case Study of 9/11 Terrorists Network , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[7]  Michael R. Fellows,et al.  Upper and lower bounds for finding connected motifs in vertex-colored graphs , 2011, J. Comput. Syst. Sci..

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Jianer Chen,et al.  Finding Pathway Structures in Protein Interaction Networks , 2007, Algorithmica.

[10]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[11]  Anthony Wirth,et al.  Are approximation algorithms for consensus clustering worthwhile? , 2007, SDM.

[12]  Riccardo Dondi,et al.  Weak pattern matching in colored graphs: Minimizing the number of connected components , 2007, ICTCS.

[13]  Andreas Björklund,et al.  Fourier meets möbius: fast subset convolution , 2006, STOC '07.

[14]  Sylvain Guillemot,et al.  Finding and Counting Vertex-Colored Subtrees , 2010, Algorithmica.

[15]  Christian Komusiewicz,et al.  Average parameterization and partial kernelization for computing medians , 2010, J. Comput. Syst. Sci..

[16]  Silvio Micali,et al.  An O(v|v| c |E|) algoithm for finding maximum matching in general graphs , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[17]  B. Lewis,et al.  Transmission network analysis in tuberculosis contact investigations. , 2007, The Journal of infectious diseases.

[18]  Marek Karpinski,et al.  Faster Algorithms for Feedback Arc Set Tournament, Kemeny Rank Aggregation and Betweenness Tournament , 2010, ISAAC.

[19]  Nicolas Bousquet,et al.  Multicut is FPT , 2010, STOC '11.

[20]  Christian Komusiewicz,et al.  Alternative Parameterizations for Cluster Editing , 2011, SOFSEM.

[21]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[22]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[23]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[24]  Vladimir Filkov,et al.  Consensus Clustering Algorithms: Comparison and Refinement , 2008, ALENEX.

[25]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[26]  Yoshiko Wakabayashi The Complexity of Computing Medians of Relations , 1998 .

[27]  Guillaume Blin,et al.  GraMoFoNe: a Cytoscape Plugin for Querying Motifs without Topology in Protein-Protein Interactions Networks , 2010, BICoB.

[28]  Jiong Guo,et al.  A More Effective Linear Kernelization for Cluster Editing , 2007, ESCAPE.

[29]  Saket Saurabh,et al.  Incompressibility through Colors and IDs , 2009, ICALP.

[30]  Henning Fernau,et al.  Kernel(s) for problems with no kernel: On out-trees with many leaves , 2008, TALG.

[31]  Gerhard J. Woeginger,et al.  Exact Algorithms for NP-Hard Problems: A Survey , 2001, Combinatorial Optimization.

[32]  Rolf Niedermeier,et al.  Parameterized Complexity of Vertex Cover Variants , 2007, Theory of Computing Systems.

[33]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[34]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[35]  Jayme Luiz Szwarcfiter,et al.  Applying Modular Decomposition to Parameterized Cluster Editing Problems , 2008, Theory of Computing Systems.

[36]  Michael R. Fellows,et al.  Towards Fully Multivariate Algorithmics: Some New Results and Directions in Parameter Ecology , 2009, IWOCA.

[37]  Lance Fortnow,et al.  Infeasibility of instance compression and succinct PCPs for NP , 2007, J. Comput. Syst. Sci..

[38]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[39]  Guillaume Blin,et al.  Querying Graphs in Protein-Protein Interactions Networks Using Feedback Vertex Set , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Christian Komusiewicz,et al.  Editing Graphs into Disjoint Unions of Dense Clusters , 2009, Algorithmica.

[41]  Riccardo Dondi,et al.  Complexity issues in vertex-colored graph pattern matching , 2011, J. Discrete Algorithms.

[42]  Stéphane Vialette,et al.  Bounded list injective homomorphism for comparative analysis of protein-protein interaction graphs , 2004, J. Discrete Algorithms.

[43]  Christian Komusiewicz,et al.  Exact Algorithms and Experiments for Hierarchical Tree Clustering , 2010, AAAI.

[44]  Christian Komusiewicz,et al.  Graph-based data clustering with overlaps , 2009, Discret. Optim..

[45]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[46]  Yoshiko Wakabayashi,et al.  A cutting plane algorithm for a clustering problem , 1989, Math. Program..

[47]  Christian Komusiewicz,et al.  Isolation concepts for efficiently enumerating dense subgraphs , 2009, Theor. Comput. Sci..

[48]  Daniel Lokshtanov,et al.  New Methods in Parameterized Algorithms and Complexity , 2009 .

[49]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[50]  David Zuckerman,et al.  Electronic Colloquium on Computational Complexity, Report No. 100 (2005) Linear Degree Extractors and the Inapproximability of MAX CLIQUE and CHROMATIC NUMBER , 2005 .

[51]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[52]  Christian Komusiewicz,et al.  A More Relaxed Model for Graph-Based Data Clustering: s-Plex Cluster Editing , 2010, SIAM J. Discret. Math..

[53]  Roded Sharan,et al.  Torque: topology-free querying of protein interaction networks , 2009, Nucleic Acids Res..

[54]  Roded Sharan,et al.  Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks , 2006, J. Comput. Biol..

[55]  Christian Komusiewicz,et al.  Fixed-Parameter Algorithms for Cluster Vertex Deletion , 2010, Theory of Computing Systems.

[56]  S. Böcker,et al.  Comprehensive cluster analysis with Transitivity Clustering , 2011, Nature Protocols.

[57]  Christian Komusiewicz,et al.  Isolation concepts for clique enumeration: Comparison and computational experiments , 2009, Theor. Comput. Sci..

[58]  Christian Komusiewicz,et al.  Parameterized Algorithms and Hardness Results for Some Graph Motif Problems , 2008, CPM.

[59]  David Osumi-Sutherland,et al.  FlyBase: enhancing Drosophila Gene Ontology annotations , 2008, Nucleic Acids Res..

[60]  Jianer Chen,et al.  A 2k kernel for the cluster editing problem , 2012, J. Comput. Syst. Sci..

[61]  Pinar Heggernes,et al.  Generalized Graph Clustering: Recognizing (p, q)-Cluster Graphs , 2010, WG.

[62]  Michael R. Fellows,et al.  Efficient Parameterized Preprocessing for Cluster Editing , 2007, FCT.

[63]  Thomas Zichner,et al.  FASPAD: fast signaling pathway detection , 2007, Bioinform..

[64]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[65]  Jörg Flum,et al.  Parameterized Complexity Theory , 2006, Texts in Theoretical Computer Science. An EATCS Series.

[66]  Sebastian Böcker,et al.  Going weighted: Parameterized algorithms for cluster editing , 2008, Theor. Comput. Sci..

[67]  Michael R. Fellows,et al.  Fixed-Parameter Tractability and Completeness II: On Completeness for W[1] , 1995, Theor. Comput. Sci..

[68]  Binhai Zhu,et al.  Weak Kernels , 2010, Electron. Colloquium Comput. Complex..

[69]  Roded Sharan,et al.  QPath: a method for querying pathways in a protein-protein interaction network , 2006, BMC Bioinformatics.

[70]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[71]  Tao Jiang,et al.  A maximum common substructure-based algorithm for searching and predicting drug-like compounds , 2008, ISMB.

[72]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[73]  Yong Zhang,et al.  Improved Algorithms for Bicluster Editing , 2008, TAMC.

[74]  Mam Riess Jones Color Coding , 1962, Human factors.

[75]  Leizhen Cai,et al.  Parameterized Complexity of Vertex Colouring , 2003, Discret. Appl. Math..

[76]  Cristina G. Fernandes,et al.  Motif Search in Graphs: Application to Metabolic Networks , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[77]  Sven Kosub,et al.  Local Density , 2004, Network Analysis.

[78]  Christian Komusiewicz,et al.  Measuring Indifference: Unit Interval Vertex Deletion , 2010, WG.

[79]  Fedor V. Fomin,et al.  Efficient Exact Algorithms on Planar Graphs: Exploiting Sphere Cut Branch Decompositions , 2005, ESA.

[80]  Dr. Zbigniew Michalewicz,et al.  How to Solve It: Modern Heuristics , 2004 .

[81]  Dániel Marx,et al.  Fixed-parameter tractability of multicut parameterized by the size of the cutset , 2010, STOC '11.

[82]  David P. Williamson,et al.  Deterministic pivoting algorithms for constrained ranking and clustering problems , 2007, SODA '07.

[83]  Christian Komusiewicz,et al.  Parameterized computational complexity of finding small-diameter subgraphs , 2012, Optim. Lett..

[84]  Dorothea Emig,et al.  Partitioning biological data with transitivity clustering , 2010, Nature Methods.

[85]  Peter Damaschke,et al.  Fixed-Parameter Enumerability of Cluster Editing and Related Problems , 2010, Theory of Computing Systems.

[86]  Hans L. Bodlaender,et al.  Partition Into Triangles on Bounded Degree Graphs , 2012, Theory of Computing Systems.

[87]  Michael R. Fellows,et al.  On problems without polynomial kernels , 2009, J. Comput. Syst. Sci..

[88]  David James Sherman,et al.  Family relationships: should consensus reign? - consensus clustering for protein families , 2007, Bioinform..

[89]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[90]  Christian Komusiewicz,et al.  A Cubic-Vertex Kernel for Flip Consensus Tree , 2012, Algorithmica.

[91]  Russell Impagliazzo,et al.  Which problems have strongly exponential complexity? , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[92]  Rolf Niedermeier,et al.  Reflections on Multivariate Algorithmics and Problem Parameterization , 2010, STACS.

[93]  Guillaume Fertin,et al.  Finding occurrences of protein complexes in protein-protein interaction graphs , 2009, J. Discrete Algorithms.

[94]  Sebastian Böcker,et al.  Exact Algorithms for Cluster Editing: Evaluation and Experiments , 2008, Algorithmica.

[95]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[96]  Fedor V. Fomin,et al.  Efficient Exact Algorithms on Planar Graphs: Exploiting Sphere Cut Decompositions , 2010, Algorithmica.

[97]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[98]  Michael R. Fellows,et al.  On the parameterized complexity of multiple-interval graph problems , 2009, Theor. Comput. Sci..

[99]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[100]  Christian Komusiewicz,et al.  Parameterized Algorithmics for Finding Connected Motifs in Biological Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[101]  Rolf Niedermeier,et al.  Graph-Modeled Data Clustering: Exact Algorithms for Clique Generation , 2005, Theory of Computing Systems.

[102]  Berend Snel,et al.  Protein Complex Evolution Does Not Involve Extensive Network Rewiring , 2008, PLoS Comput. Biol..

[103]  Christian Komusiewicz,et al.  On Generating Triangle-Free Graphs , 2009, Electron. Notes Discret. Math..

[104]  Yijia Chen,et al.  Machine-based methods in parameterized complexity theory , 2005, Theor. Comput. Sci..

[105]  Michael R. Fellows,et al.  FIXED-PARAMETER TRACTABILITY AND COMPLETENESS , 2022 .

[106]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[107]  Stephen B. Seidman,et al.  A graph‐theoretic generalization of the clique concept* , 1978 .

[108]  Jianer Chen,et al.  On Parameterized Intractability: Hardness and Completeness , 2008, Comput. J..

[109]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[110]  Johannes Uhlmann Multivariate Algorithmics in Biological Data Analysis , 2011 .

[111]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[112]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[113]  Steven Skiena,et al.  Integrating microarray data by consensus clustering , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[114]  T. Ideker,et al.  Modeling cellular machinery through biological network comparison , 2006, Nature Biotechnology.

[115]  Thomas Zichner,et al.  Algorithm Engineering for Color-Coding with Applications to Signaling Pathway Detection , 2008, Algorithmica.

[116]  Rolf Niedermeier,et al.  A general method to speed up fixed-parameter-tractable algorithms , 2000, Inf. Process. Lett..

[117]  John M. Lewis,et al.  The Node-Deletion Problem for Hereditary Properties is NP-Complete , 1980, J. Comput. Syst. Sci..

[118]  Mirko Krivánek,et al.  NP-hard problems in hierarchical-tree clustering , 1986, Acta Informatica.

[119]  Mark Gerstein,et al.  Predicting interactions in protein networks by completing defective cliques , 2006, Bioinform..

[120]  Michael R. Fellows,et al.  Clustering with partial information , 2008, Theor. Comput. Sci..

[121]  Dániel Marx,et al.  Clustering with local restrictions , 2011, Inf. Comput..

[122]  Paola Bonizzoni,et al.  On the Approximation of Correlation Clustering and Consensus Clustering , 2008, J. Comput. Syst. Sci..

[123]  Peter Damaschke,et al.  Even faster parameterized cluster deletion and cluster editing , 2011, Inf. Process. Lett..

[124]  Sven Rahmann,et al.  Exact and heuristic algorithms for weighted cluster editing. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[125]  Christian Komusiewicz,et al.  On making directed graphs transitive , 2009, J. Comput. Syst. Sci..

[126]  Christian Komusiewicz,et al.  On the parameterized complexity of consensus clustering , 2011, Theor. Comput. Sci..

[127]  Hans L. Bodlaender,et al.  Kernelization: New Upper and Lower Bound Techniques , 2009, IWPEC.

[128]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[129]  Michael R. Fellows,et al.  The Lost Continent of Polynomial Time: Preprocessing and Kernelization , 2006, IWPEC.

[130]  Dieter van Melkebeek,et al.  Satisfiability allows no nontrivial sparsification unless the polynomial-time hierarchy collapses , 2010, STOC '10.

[131]  Roded Sharan,et al.  QNet: A Tool for Querying Protein Interaction Networks , 2007, RECOMB.

[132]  Hiroyuki Kurata,et al.  Diffusion Model Based Spectral Clustering for Protein-Protein Interaction Networks , 2010, PloS one.

[133]  C. W. Tate Solve it. , 2005, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[134]  P. Erdos,et al.  On chromatic number of graphs and set-systems , 1966 .

[135]  Jianer Chen,et al.  Cluster Editing: Kernelization Based on Edge Cuts , 2010, Algorithmica.

[136]  Rolf Niedermeier,et al.  A Structural View on Parameterizing Problems: Distance from Triviality , 2004, IWPEC.

[137]  Sven Rahmann,et al.  Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing , 2007, BMC Bioinformatics.

[138]  Roded Sharan,et al.  Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data , 2004, J. Comput. Biol..

[139]  Robin Thomas,et al.  Call routing and the ratcatcher , 1994, Comb..

[140]  Rolf Niedermeier,et al.  Partial Kernelization for Rank Aggregation: Theory and Experiments , 2010, IPEC.

[141]  F. Harary THE MAXIMUM CONNECTIVITY OF A GRAPH. , 1962, Proceedings of the National Academy of Sciences of the United States of America.

[142]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[143]  Romeo Rizzi,et al.  Complexity issues in color-preserving graph embeddings , 2010, Theor. Comput. Sci..

[144]  Dimitrios M. Thilikos,et al.  Confronting intractability via parameters , 2011, Comput. Sci. Rev..

[145]  Roded Sharan,et al.  Topology-Free Querying of Protein Interaction Networks , 2009, RECOMB.

[146]  Leizhen Cai,et al.  Fixed-Parameter Tractability of Graph Modification Problems for Hereditary Properties , 1996, Inf. Process. Lett..

[147]  Sebastian Böcker,et al.  A golden ratio parameterized algorithm for Cluster Editing , 2011, J. Discrete Algorithms.

[148]  Roded Sharan,et al.  Cluster graph modification problems , 2002, Discret. Appl. Math..

[149]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.