Clustering provenance facilitating provenance exploration through data abstraction

As digital objects become increasingly important in people's lives, people may need to understand the provenance, or lineage and history, of an important digital object, to understand how it was produced. This is particularly important for objects created from large, multi-source collections of personal data. As the metadata describing provenance, Provenance Data, is commonly represented as a labelled directed acyclic graph, the challenge is to create effective interfaces onto such graphs so that people can understand the provenance of key digital objects. This unsolved problem is especially challenging for the case of novice and intermittent users and complex provenance graphs. We tackle this by creating an interface based on a clustering approach. This was designed to enable users to view provenance graphs, and to simplify complex graphs by combining several nodes. Our core contribution is the design of a prototype interface that supports clustering and its analytic evaluation in terms of desirable properties of visualisation interfaces.

[1]  Margo I. Seltzer,et al.  Provenance Map Orbiter: Interactive Exploration of Large Provenance Graphs , 2011, TaPP.

[2]  Alan Fekete,et al.  Design-level performance prediction of component-based applications , 2005, IEEE Transactions on Software Engineering.

[3]  Wei Chen,et al.  Exploiting deadline flexibility in Grid workflow rescheduling , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[4]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[5]  N. Lesley,et al.  Reconfigurable algorithms in view synchrony , 2001, Proceedings IEEE International Symposium on Network Computing and Applications. NCA 2001.

[6]  Alan Fekete,et al.  Computing end-to-end delays in stream query processing , 2015, Sci. Comput. Program..

[7]  Kevin Lee,et al.  Size Estimation of Cloud Migration Projects with Cloud Migration Point (CMP) , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[8]  Julian Jang,et al.  Expressiveness of Workflow Description Languages , 2003, ICWS.

[9]  Alan Fekete,et al.  Teaching transaction management with SQL examples , 2005, ITiCSE '05.

[10]  Alan Fekete,et al.  YCSB+T: Benchmarking web-scale transactional databases , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[11]  Julian Jang,et al.  Just what could possibly go wrong in B2B integration? , 2003, Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003.

[12]  Emden R. Gansner,et al.  A Technique for Drawing Directed Graphs , 1993, IEEE Trans. Software Eng..

[13]  Alan Fekete,et al.  Serializable snapshot isolation for replicated databases in high-update scenarios , 2011, Proc. VLDB Endow..

[14]  Sameh Elnikety,et al.  One-copy serializability with snapshot isolation under the hood , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[15]  Kevin Lee,et al.  GEAP: A Generic Approach to Predicting Workload Bursts for Web Hosted Events , 2014, WISE.

[16]  Alan Fekete,et al.  AMID: autonomous modeler of intragenic duplication. , 2003, Applied bioinformatics.

[17]  John Zic,et al.  Expressing and Reasoning about Service Contracts in Service-Oriented Computing , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[18]  Philip A. Bernstein,et al.  Relaxed-currency serializability for middle-tier caching and replication , 2006, SIGMOD Conference.

[19]  Yolanda Gil,et al.  PROV Model Primer , 2012 .

[20]  Alan Fekete,et al.  An Enactment-Engine Based on Use-Cases , 2007, BPM.

[21]  Bob Kummerfeld,et al.  Dynamic network service installation in an active network , 2001, Comput. Networks.

[22]  Julian Jang,et al.  Compensation is Not Enough , 2003 .

[23]  Ali Ghodsi,et al.  Coordination Avoidance in Database Systems , 2014, Proc. VLDB Endow..

[24]  Idit Keidar,et al.  A framework for highly available services based on group communication , 2001, Proceedings 21st International Conference on Distributed Computing Systems Workshops.

[25]  Alan Fekete,et al.  The Cost of Serializability on Platforms That Use Snapshot Isolation , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[26]  Krzysztof Z. Gajos,et al.  Evaluation of Filesystem Provenance Visualization Tools , 2013, IEEE Transactions on Visualization and Computer Graphics.

[27]  Alan Fekete,et al.  Quantifying Isolation Anomalies , 2009, Proc. VLDB Endow..

[28]  Alan Fekete,et al.  Application migration to cloud: a taxonomy of critical factors , 2011, SECLOUD '11.

[29]  Dennis Shasha,et al.  Making snapshot isolation serializable , 2005, TODS.

[30]  Alan Fekete,et al.  Allocating isolation levels to transactions , 2005, PODS '05.

[31]  Alan Fekete,et al.  The Efficacy of Commutativity-Based Semantic Locking in a Real-World Application , 2008, IEEE Transactions on Knowledge and Data Engineering.

[32]  Ali Ghodsi,et al.  Highly Available Transactions: Virtues and Limitations , 2013, Proc. VLDB Endow..

[33]  Margo I. Seltzer,et al.  BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure , 2012, TaPP.

[34]  Andy Hopper,et al.  OPUS: A Lightweight System for Observational Provenance in User Space , 2013, TaPP.

[35]  Vasa Curcin,et al.  ProvAbs: model, policy, and tooling for abstracting PROV graphs , 2014, IPAW.

[36]  Alan Fekete,et al.  Curracurrong cloud: Stream processing in the cloud , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[37]  James Abello,et al.  ASK-GraphView: A Large Scale Graph Visualization System , 2006, IEEE Transactions on Visualization and Computer Graphics.

[38]  Julian Jang,et al.  An Event-Driven Workflow Engine for Service-based Business Systems , 2006, 2006 10th IEEE International Enterprise Distributed Object Computing Conference (EDOC'06).

[39]  Alan Fekete Teaching students to develop thread-safe java classes , 2008, ITiCSE.

[40]  Alan Fekete,et al.  Lightweight Analysis of Object Interactions , 2001, TACS.

[41]  Julian Jang,et al.  Towards a Framework for Capturing Transactional Requirements of Real Workflows , 2002 .

[42]  Nancy A. Lynch,et al.  Specifying and using a partitionable group communication service , 2001, TOCS.

[43]  Julian Jang,et al.  Delivering Promises for Web Services Applications , 2007, IEEE International Conference on Web Services (ICWS 2007).

[44]  Heon Young Yeom,et al.  Scalable serializable snapshot isolation for multicore systems , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[45]  Gerhard Weikum,et al.  Unbundling Transaction Services in the Cloud , 2009, CIDR.

[46]  Alan Fekete Using Counter-Examples in the Data Structures Course , 2003, ACE.

[47]  Heon Young Yeom,et al.  Performance of Serializable Snapshot Isolation on Multicore Servers , 2013, DASFAA.

[48]  Leonard J. Bass,et al.  Consumer Monitoring of Infrastructure Performance in a Public Cloud , 2014, WISE.

[49]  Alan Fekete,et al.  When serializability comes without cost , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[50]  Alan Fekete,et al.  Predicting the performance of middleware-based applications at the design level , 2004, WOSP '04.

[51]  Julian Jang,et al.  Transactions in Loosely Coupled Distributed Systems , 2003, ADC.

[52]  Saul Greenberg,et al.  Navigating hierarchically clustered networks through fisheye and full-zoom methods , 1996, TCHI.

[53]  Alan Fekete,et al.  Performance of program modification techniques that ensure serializable executions with snapshot isolation DBMS , 2014, Inf. Syst..

[54]  Kevin Lee,et al.  Data Consistency Properties and the Trade-offs in Commercial Cloud Storage: the Consumers' Perspective , 2011, CIDR.

[55]  Alan Fekete,et al.  Consistency Models for Replicated Data , 2010, Replication.

[56]  Judy Kay,et al.  Learner reflection in student self-assessment , 2007 .

[57]  Ali Ghodsi,et al.  Scalable atomic visibility with RAMP transactions , 2014, SIGMOD Conference.

[58]  Alan Fekete,et al.  A Robust Technique to Ensure Serializable Executions with Snapshot Isolation DBMS , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[59]  S. Sudarshan,et al.  Automating the Detection of Snapshot Isolation Anomalies , 2007, VLDB.

[60]  Julian Jang,et al.  Implementing Isolation for Service-Based Applications , 2009, OTM Conferences.

[61]  Michael J. Cahill Serializable isolation for snapshot databases , 2009, TODS.

[62]  Alan Fekete,et al.  An empirical study of commutativity in application code , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[63]  Ippokratis Pandis,et al.  The Serial Safety Net: Efficient Concurrency Control on Modern Hardware , 2015, DaMoN.

[64]  Marc Chiarini,et al.  Collecting Provenance via the Xen Hypervisor , 2011, TaPP.

[65]  Alan Fekete,et al.  Providing view synchrony for group communication services , 2003, Acta Informatica.

[66]  Heon Young Yeom,et al.  A scalable lock manager for multicores , 2013, SIGMOD '13.

[67]  Alan Fekete,et al.  Serializable Executions with Snapshot Isolation: Modifying Application Code or Mixing Isolation Levels? , 2008, DASFAA.

[68]  Albert Y. Zomaya,et al.  Adaptive multiple-workflow scheduling with task rearrangement , 2014, The Journal of Supercomputing.

[69]  David Bearman,et al.  The Power of the Principle of Provenance , 1985 .

[70]  Julian Jang,et al.  A Service-Oriented Workflow Language for Robust Interacting Applications , 2005, OTM Conferences.

[71]  Alan Fekete,et al.  Robust Snapshot Replication , 2013, ADC.

[72]  Leonard J. Bass,et al.  Rollup: Non-Disruptive Rolling Upgrade with Fast Consensus-Based Dynamic Reconfigurations , 2016, IEEE Transactions on Parallel and Distributed Systems.

[73]  Jon Froehlich,et al.  Personal informatics in practice: improving quality of life through data , 2012, CHI Extended Abstracts.