Bonsai: Growing Interesting Small Trees

Graphs are increasingly used to model a variety of loosely structured data such as biological or social networks and entity-relationships. Given this profusion of large-scale graph data, efficiently discovering interesting substructures buried within is essential. These substructures are typically used in determining subsequent actions, such as conducting visual analytics by humans or designing expensive biomedical experiments. In such settings, it is often desirable to constrain the size of the discovered results in order to directly control the associated costs. In this paper, we address the problem of finding cardinality-constrained connected sub trees in large node-weighted graphs that maximize the sum of weights of selected nodes. We provide an efficient constant-factor approximation algorithm for this strongly NP-hard problem. Our techniques can be applied in a wide variety of application settings, for example in differential analysis of graphs, a problem that frequently arises in bioinformatics but also has applications on the web.

[1]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[2]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[3]  Ambuj K. Singh,et al.  Efficient Algorithms for Mining Significant Substructures in Graphs with Quality Guarantees , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[4]  Matteo Fischetti,et al.  Weighted k-cardinality trees: Complexity and polyhedral structure , 1994, Networks.

[5]  Cristina G. Fernandes O ( n 2 log n ) implementation of an approximationfor the Prize-Colle ting Steiner Tree ProblemPaulo , 2012 .

[6]  Christian Blum,et al.  Revisiting dynamic programming for finding optimal subtrees in trees , 2007, Eur. J. Oper. Res..

[7]  Arie Segev,et al.  The node-weighted steiner tree problem , 1987, Networks.

[8]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[9]  Anthony K. H. Tung,et al.  CSV: visualizing and mining cohesive subgraphs , 2008, SIGMOD Conference.

[10]  Nenad Mladenović,et al.  Variable Neighborhood Search for the Vertex Weighted k -Cardinality Tree , 2004 .

[11]  Tobias Müller,et al.  Identifying functional modules in protein–protein interaction networks: an integrated exact approach , 2008, ISMB.

[12]  Hans-Peter Seidel,et al.  Acquisition and Analysis of Bispectral Bidirectional Reflectance Distribution Functions , 2009 .

[13]  Christian Blum,et al.  Local Search Algorithms for the k-cardinality Tree Problem , 2003, Discret. Appl. Math..

[14]  Tim Roughgarden,et al.  Approximate k-MSTs and k-Steiner trees via the primal-dual method and Lagrangean relaxation , 2001, Math. Program..

[15]  David P. Williamson,et al.  A general approximation technique for constrained forest problems , 1992, SODA '92.

[16]  Carlos Eduardo Ferreira,et al.  Primal-dual approximation algorithms for the Prize-Collecting Steiner Tree Problem , 2007, Inf. Process. Lett..

[17]  David S. Johnson,et al.  The prize collecting Steiner tree problem: theory and practice , 2000, SODA '00.

[18]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[19]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[20]  M. J. Blesa,et al.  Solving the KCT Problem: Large‐Scale Neighborhood Search and Solution Merging , 2009 .

[21]  Sivan Toledo,et al.  Characterizing the Performance of Flash Memory Storage Devices and Its Impact on Algorithm Design , 2008, WEA.

[22]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[23]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[24]  Stefan Siersdorfer,et al.  A neighborhood-based approach for clustering of linked document collections , 2006, CIKM '06.

[25]  Ernst Althaus,et al.  A Lagrangian relaxation approach for the multiple sequence alignment problem , 2008, J. Comb. Optim..

[26]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[27]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[28]  Enrique Alba,et al.  Optimization Techniques for Solving Complex Problems , 2009 .

[29]  Hans-Peter Seidel,et al.  Global stochastic optimization for robust and accurate human motion capture , 2007 .

[30]  Ernst Althaus,et al.  Integer Linear Programming in Computational Biology , 2009, Efficient Algorithms.