Distributed Provenance Compression

Network provenance, which records the execution history of network events as meta-data, is becoming increasingly important for network accountability and failure diagnosis. For example, network provenance may be used to trace the path that a message traversed in a network, or to reveal how a particular routing entry was derived and the parties involved in its derivation. A challenge when storing the provenance of a live network is that the large number of the arriving messages may incur substantial storage overhead. In this paper, we explore techniques to dynamically compress distributed provenance stored at scale. Logically, the compression is achieved by grouping equivalent provenance trees and maintaining only one concrete copy for each equivalence class. To efficiently identify equivalent provenance, we (1) introduce distributed event-based linear programs (DELP) to specify distributed network applications, and (2) statically analyze DELPs to allow for quick detection of provenance equivalence at runtime. Our experimental results demonstrate that our approach leads to significant storage reduction and query latency improvement over alternative approaches.

[1]  Andreas Haeberlen,et al.  Secure network provenance , 2011, SOSP.

[2]  Jakub Závodný,et al.  Factorised representations of query results: size bounds and readability , 2012, ICDT '12.

[3]  David Walker,et al.  Abstractions for network update , 2012, SIGCOMM '12.

[4]  Michael Stonebraker,et al.  Supporting fine-grained data lineage in a database visualization environment , 1997, Proceedings 13th International Conference on Data Engineering.

[5]  Jakub Závodný,et al.  On Factorisation of Provenance Polynomials , 2011, TaPP.

[6]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[7]  Andreas Haeberlen,et al.  Automated Network Repair with Meta Provenance , 2015, HotNets.

[8]  Ion Stoica,et al.  Declarative networking: language, execution and optimization , 2006, SIGMOD Conference.

[9]  Xiaozhou Li,et al.  RapidMesh: declarative toolkit for rapid experimentation of wireless mesh networks , 2009, WINTECH '09.

[10]  Hao Xu,et al.  A Program Logic for Verifying Secure Routing Protocols , 2015, Log. Methods Comput. Sci..

[11]  Val Tannen,et al.  Querying data provenance , 2010, SIGMOD Conference.

[12]  Andreas Haeberlen,et al.  Distributed Time-aware Provenance , 2012, Proc. VLDB Endow..

[13]  Ellen W. Zegura,et al.  How to model an internetwork , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[14]  Andreas Haeberlen,et al.  The Good, the Bad, and the Differences: Better Network Diagnostics with Differential Provenance , 2016, SIGCOMM.

[15]  Paul V. Mockapetris,et al.  Domain names - implementation and specification , 1987, RFC.

[16]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[17]  Daniel Deutch,et al.  On provenance minimization , 2012, TODS.

[18]  Andreas Haeberlen,et al.  Diagnosing missing events in distributed systems with negative provenance , 2014, SIGCOMM.

[19]  Robert Tappan Morris,et al.  DNS performance and the effectiveness of caching , 2001, IMW '01.

[20]  Ralph E. Droms,et al.  Dynamic Host Configuration Protocol , 1993, RFC.

[21]  Boon Thau Loo,et al.  Declarative Toolkit for Rapid Network Protocol Simulation and Experimentation , 2009 .

[22]  Dan Feng,et al.  Evaluation of a Hybrid Approach for Efficient Provenance Storage , 2013, TOS.

[23]  Xiaozhou Li,et al.  Efficient querying and maintenance of network provenance at internet-scale , 2010, SIGMOD Conference.

[24]  David C. Plummer,et al.  Ethernet Address Resolution Protocol: Or Converting Network Protocol Addresses to 48.bit Ethernet Address for Transmission on Ethernet Hardware , 1982, RFC.

[25]  Ion Stoica,et al.  Declarative networking , 2009, Commun. ACM.

[26]  Shazia Wasim Sadiq,et al.  Efficient provenance storage for relational queries , 2012, CIKM '12.