The Minimum Edit Arborescence Problem and Its Use in Compressing Graph Collections [Extended Version]

The inference of minimum spanning arborescences within a set of objects is a general problem which translates into numerous application-specific unsupervised learning tasks. We introduce a unified and generic structure called edit arborescence that relies on edit paths between data in a collection, as well as the Minimum Edit Arborescence Problem, which asks for an edit arborescence that minimizes the sum of costs of its inner edit paths. Through the use of suitable cost functions, this generic framework allows to model a variety of problems. In particular, we show that by introducing encoding size preserving edit costs, it can be used as an efficient method for compressing collections of labeled graphs. Experiments on various graph datasets, with comparisons to standard compression tools, show the potential of our method.

[1]  Robert E. Tarjan,et al.  Finding optimum branchings , 1977, Networks.

[2]  Gustav Sourek,et al.  Lossless Compression of Structured Convolutional Models via Lifting , 2020, ICLR.

[3]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[4]  Peter Richmond,et al.  The Evolution of Interdependence in World Equity Markets: Evidence from Minimum Spanning Trees , 2006, physics/0607022.

[5]  Anthony K. H. Tung,et al.  Comparing Stars: On Approximating Graph Edit Distance , 2009, Proc. VLDB Endow..

[6]  Kaspar Riesen,et al.  Structural Pattern Recognition with Graph Edit Distance: Approximation Algorithms and Applications , 2016 .

[7]  Günther R. Raidl,et al.  Solving a k-Node Minimum Label Spanning Arborescence Problem to Compress Fingerprint Templates , 2009, J. Math. Model. Algorithms.

[8]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[9]  Luc Brun,et al.  Comparing heuristics for graph edit distance computation , 2019, The VLDB Journal.

[10]  Kaspar Riesen,et al.  IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning , 2008, SSPR/SPR.

[11]  David B. Blumenthal New Techniques for Graph Edit Distance Computation , 2019, ArXiv.

[12]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[13]  Kaspar Riesen,et al.  Structural Pattern Recognition with Graph Edit Distance , 2016, Advances in Computer Vision and Pattern Recognition.

[14]  Shinichi Nakagawa,et al.  Phylogenetic comparative methods , 2017, Current Biology.

[15]  Lei Zou,et al.  Efficient Graph Similarity Search Over Large Graph Databases , 2015, IEEE Transactions on Knowledge and Data Engineering.

[16]  Shmuel Tomi Klein,et al.  Compression of correlated bit-vectors , 1991, Inf. Syst..

[17]  Luc Brun,et al.  GEDLIB: A C++ Library for Graph Edit Distance Computation , 2019, GbRPR.

[18]  Regina Barzilay,et al.  Unsupervised Learning of Morphological Forests , 2017, Transactions of the Association for Computational Linguistics.

[19]  Torsten Hoefler,et al.  Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations , 2018, ArXiv.

[20]  Matteo Fischetti,et al.  An Efficient Algorithm for the Min-Sum Arborescence Problem on Complete Digraphs , 1993, INFORMS J. Comput..

[21]  Roberto Basili,et al.  Semantic Role Labeling via Tree Kernel Joint Inference , 2006, CoNLL.

[22]  Johann Gamper,et al.  Correlation graph analytics for stock time series data , 2021, EDBT.

[23]  Prashant Pandey,et al.  An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search. , 2020, Journal of computational biology : a journal of computational molecular cell biology.

[24]  Gustavo Alonso,et al.  Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries , 2019, ArXiv.