Big Data analytics in static and streaming provenance

[1]  Dan Suciu,et al.  Stream processing of XPath queries with predicates , 2003, SIGMOD '03.

[2]  Sudarshan S. Chawathe,et al.  Comparing Hierarchical Data in External Memory , 1999, VLDB.

[3]  Xiangyu Zhang,et al.  Precise dynamic slicing algorithms , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[4]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[5]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[6]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[7]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8]  Paul T. Groth,et al.  Representing distributed systems using the Open Provenance Model , 2011, Future Gener. Comput. Syst..

[9]  Archan Misra,et al.  A time-and-value centric provenance model and architecture for medical event streams , 2007, HealthNet '07.

[10]  Yogesh L. Simmhan,et al.  A Framework for Collecting Provenance in Data-Centric Scientific Workflows , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[11]  Umut A. Acar Self-adjusting computation: (an overview) , 2009, PEPM '09.

[12]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[13]  Mohan M. Trivedi,et al.  Graph matching using a direct classification of node attendance , 1996, Pattern Recognit..

[14]  Yogesh Simmhan,et al.  Automatic Provenance Recording for Scientific Data using Trident , 2008 .

[15]  Joan Feigenbaum,et al.  On graph problems in a semi-streaming model , 2005, Theor. Comput. Sci..

[16]  Julie Steele,et al.  Designing Data Visualizations , 2011 .

[17]  Xiaohua Hu,et al.  Exploiting Wikipedia as external knowledge for document clustering , 2009, KDD.

[18]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[19]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[20]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[21]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[22]  Chen Shou,et al.  Distributed data provenance for large-scale data-intensive computing , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[23]  Joydeep Ghosh,et al.  Under Consideration for Publication in Knowledge and Information Systems Generative Model-based Document Clustering: a Comparative Study , 2003 .

[24]  Beth Plale,et al.  Temporal representation for mining scientific data provenance , 2014, Future Gener. Comput. Syst..

[25]  Yogesh L. Simmhan,et al.  Towards a Quality Model for Effective Data Selection in Collaboratories , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[26]  Devarshi Ghoshal,et al.  Visualization of network data provenance , 2012, 2012 19th International Conference on High Performance Computing.

[27]  Cláudia Antunes,et al.  Temporal Data Mining: an overview , 2001 .

[28]  Margo I. Seltzer,et al.  Provenance for the Cloud , 2010, FAST.

[29]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.

[30]  James Frew,et al.  Composing lineage metadata with XML for custom satellite-derived data products , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[31]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[32]  Markus Kunde,et al.  Requirements for a Provenance Visualization Component , 2008, IPAW.

[33]  Edoardo Pignotti,et al.  Using provenance to analyse agent-based simulations , 2013, EDBT '13.

[34]  Quan Zhou,et al.  Komadu: A Capture and Visualization System for Scientific Data Provenance , 2015 .

[35]  Tristan Glatard,et al.  Self-Healing of Operational Workflow Incidents on Distributed Computing Infrastructures , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[36]  Paul T. Groth A Distributed Algorithm for Determining the Provenance of Data , 2008, 2008 IEEE Fourth International Conference on eScience.

[37]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[38]  Marta Mattoso,et al.  Provenance management in Swift , 2011, Future Gener. Comput. Syst..

[39]  Beth Plale,et al.  Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering , 2006, IPAW.

[40]  Yixin Chen,et al.  A comparison of a graph database and a relational database: a data provenance perspective , 2010, ACM SE '10.

[41]  Archan Misra,et al.  Advances and Challenges for Scalable Provenance in Stream Processing Systems , 2008, IPAW.

[42]  Simon Miles Electronically Querying for the Provenance of Entities , 2006, IPAW.

[43]  Susan B. Davidson,et al.  PDiffView: Viewing the Difference in Provenance of Workflow Results , 2009, Proc. VLDB Endow..

[44]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[45]  Karl Aberer,et al.  Sensor Metadata Management and Its Application in Collaborative Environmental Research , 2008, 2008 IEEE Fourth International Conference on eScience.

[46]  John Abraham,et al.  Storing, Indexing and Querying Large Provenance Data Sets as RDF Graphs in Apache HBase , 2013, 2013 IEEE Ninth World Congress on Services.

[47]  Sebastian Rudolph,et al.  EP-SPARQL: a unified language for event processing and stream reasoning , 2011, WWW.

[48]  Ziv Bar-Yossef,et al.  Reductions in streaming algorithms, with an application to counting triangles in graphs , 2002, SODA '02.

[49]  Javid Taheri,et al.  SparseDTW: A Novel Approach to Speed up Dynamic Time Warping , 2009, AusDM.

[50]  James Cheney,et al.  Program Slicing and Data Provenance , 2007, IEEE Data Eng. Bull..

[51]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[52]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[53]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[54]  Yong Zhao,et al.  A Logic Programming Approach to Scientific Workflow Provenance Querying , 2008, IPAW.

[55]  Richard R. Brooks,et al.  Assessing the Effect of WiMAX System Parameter Settings on MAC-level Local DoS Vulnerability , 2012 .

[56]  David A. Bennett,et al.  Toward an understanding of provenance in complex land use dynamics , 2011 .

[57]  James P. Ahrens,et al.  A First Study on Clustering Collections of Workflow Graphs , 2008, IPAW.

[58]  David E. Culler,et al.  PlanetLab: an overlay testbed for broad-coverage services , 2003, CCRV.

[59]  Luc Moreau,et al.  A Formal Account of the Open Provenance Model , 2015, TWEB.

[60]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[61]  Anton Kruger,et al.  Building a terabyte NEXRAD radar database for hydrometeorology research , 2006, Comput. Geosci..

[62]  Joan Feigenbaum,et al.  Graph Distances in the Data-Stream Model , 2008, SIAM J. Comput..

[63]  Li An,et al.  Modeling human decisions in coupled human and natural systems: Review of agent-based models , 2012 .

[64]  Anthony J. G. Hey,et al.  Jim Gray on eScience: a transformed scientific method , 2009, The Fourth Paradigm.

[65]  Margo I. Seltzer,et al.  Choosing a Data Model and Query Language for Provenance , 2008, IPAW 2008.

[66]  David Leake,et al.  Unmanaged Workflows: Their Provenance and Use , 2013 .

[67]  Margo I. Seltzer,et al.  Provenance Map Orbiter: Interactive Exploration of Large Provenance Graphs , 2011, TaPP.

[68]  Tom Evans,et al.  Dependency Provenance in Agent Based Modeling , 2013, 2013 IEEE 9th International Conference on e-Science.

[69]  Luc Moreau,et al.  An on-the-fly provenance tracking mechanism for stream processing systems , 2013, 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS).

[70]  Min Wang,et al.  Provenance query evaluation: what's so special about it? , 2009, CIKM.

[71]  Ewa Deelman,et al.  Failure prediction and localization in large scientific workflows , 2011, WORKS '11.

[72]  G. A. Venkatesh,et al.  Experimental results from dynamic slicing of C programs , 1995, TOPL.

[73]  Sara J. Graves,et al.  Towards Dynamically Adaptive Weather Analysis and Forecasting in LEAD , 2005, International Conference on Computational Science.

[74]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[75]  A. B. Kahn,et al.  Topological sorting of large networks , 1962, CACM.

[76]  Sreenivas Gollapudi,et al.  Estimating PageRank on graph streams , 2008, PODS.

[77]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[78]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[79]  Paul T. Groth,et al.  The provenance of electronic data , 2008, CACM.

[80]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[81]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[82]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[83]  Yurdaer N. Doganata,et al.  Large-Scale Distributed Storage System for Business Provenance , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[84]  Ramon Lawrence,et al.  Managing data quality in a terabyte-scale sensor archive , 2008, SAC '08.

[85]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[86]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[87]  Viktor K. Prasanna,et al.  Provenance management for dynamic, distributed and dataflow environments , 2012 .

[88]  Beth Plale,et al.  ProvErr: System Level Statistical Fault Diagnosis Using Dependency Model , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[89]  Tommy Hoffner Evaluation and Comparison of Program Slicing Tools , 1995 .

[90]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[91]  David B. Leake,et al.  Towards Case-Based Support for e-Science Workflow Generation by Mining Provenance , 2008, ECCBR.

[92]  Yolanda Gil,et al.  PROV-DM: The PROV Data Model , 2013 .

[93]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[94]  Bertram Ludäscher,et al.  Techniques for efficiently querying scientific workflow provenance graphs , 2010, EDBT '10.

[95]  Lavanya Ramakrishnan,et al.  WORKEM: Representing and Emulating Distributed Scientific Workflow Execution State , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[96]  G. Eisenhauer,et al.  Framework for Collaborative Steering of Scienti c Applications , 1997 .

[97]  Daniel W. Margo,et al.  Using Provenance to Extract Semantic File Attributes , 2010, TaPP.

[98]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[99]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[100]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[101]  Cláudio T. Silva,et al.  Querying and Creating Visualizations by Analogy , 2007, IEEE Transactions on Visualization and Computer Graphics.

[102]  Beth Plale,et al.  Tracking Stream Provenance in Complex Event Processing Systems for Workflow-Driven Computing , 2007 .

[103]  Devarshi Ghoshal,et al.  Study in Usefulness of Middleware-Only Provenance , 2014, 2014 IEEE 10th International Conference on e-Science.

[104]  James Cheney,et al.  Provenance as dependency analysis† , 2007, Mathematical Structures in Computer Science.

[105]  Joseph Robert Horgan,et al.  Dynamic program slicing , 1990, PLDI '90.

[106]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[107]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[108]  Jeffrey G. Arnold,et al.  Automatic calibration of a distributed catchment model , 2001 .

[109]  Bugra Gedik,et al.  Visual Debugging for Stream Processing Applications , 2010, RV.

[110]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[111]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[112]  Sebastian Nowozin,et al.  gBoost: a mathematical programming approach to graph classification and regression , 2009, Machine Learning.

[113]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[114]  Gary D. Bader,et al.  Cytoscape Web: an interactive web-based network browser , 2010, Bioinform..

[115]  Yolanda Gil,et al.  Provenance trails in the Wings-Pegasus system , 2008 .

[116]  Frank Tip,et al.  A survey of program slicing techniques , 1994, J. Program. Lang..

[117]  Luc Moreau,et al.  The Foundations for Provenance on the Web , 2010, Found. Trends Web Sci..

[118]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[119]  Ashish Gehani,et al.  Tracking and Sketching Distributed Data Provenance , 2010, 2010 IEEE Sixth International Conference on e-Science.

[120]  Bertram Ludäscher,et al.  Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life , 2008, IPAW.

[121]  Luc Moreau,et al.  Recording and Reasoning over Data Provenance in Web and Grid Services , 2003, OTM.

[122]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[123]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[124]  Bertram Ludäscher,et al.  Efficient provenance storage over nested data collections , 2009, EDBT '09.

[125]  Ashish Gehani,et al.  SPADE: Support for Provenance Auditing in Distributed Environments , 2012, Middleware.

[126]  Jane Hunter,et al.  Provenance Explorer - Customized Provenance Views Using Semantic Inferencing , 2006, SEMWEB.

[127]  Amit P. Sheth,et al.  Semantic Provenance for eScience: Managing the Deluge of Scientific Data , 2008, IEEE Internet Computing.

[128]  Karsten Schwan,et al.  Dynamic Querying of Streaming Data with the dQUOB System , 2003, IEEE Trans. Parallel Distributed Syst..

[129]  Gang Wu,et al.  Stream Bench: Towards Benchmarking Modern Distributed Stream Computing Frameworks , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.

[130]  Stan Salvador,et al.  FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space , 2004 .

[131]  Devarshi Ghoshal,et al.  Provenance from log files: a BigData problem , 2013, EDBT '13.

[132]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[133]  Jean-Daniel Fekete The InfoVis Toolkit , 2004 .

[134]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[135]  Daniele Braga,et al.  C-SPARQL: SPARQL for continuous querying , 2009, WWW '09.

[136]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[137]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[138]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[139]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[140]  Vasa Curcin,et al.  Data Provenance and Data Management in eScience , 2013 .

[141]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[142]  Cláudio T. Silva,et al.  Provenance for Visualizations: Reproducibility and Beyond , 2007, Computing in Science & Engineering.

[143]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[144]  David B. Leake,et al.  A Noisy 10GB Provenance Database , 2011, Business Process Management Workshops.

[145]  Yaxing Wei,et al.  Provenance Storage, Querying, and Visualization in PBase , 2014, IPAW.

[146]  Paulo Pinheiro,et al.  Probe-It! Visualization Support for Provenance , 2007, ISVC.

[147]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[148]  Antoine H. C. van Kampen,et al.  A Provenance Approach to Trace Scientific Experiments on a Grid Infrastructure , 2011, 2011 IEEE Seventh International Conference on eScience.

[149]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[150]  Liang Chen,et al.  GATES: a grid-based middleware for processing distributed data streams , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[151]  Luc Moreau,et al.  Stream ancestor function: A mechanism for fine-grained provenance in stream processing systems , 2012, 2012 Sixth International Conference on Research Challenges in Information Science (RCIS).

[152]  Randall D. Beer,et al.  An Integrated Neuromechanical Model of Steering in C. elegans , 2015, ECAL.

[153]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[154]  Andrew McGregor,et al.  Graph stream algorithms: a survey , 2014, SGMD.

[155]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[156]  Liping Di,et al.  Augmenting geospatial data provenance through metadata tracking in geospatial service chaining , 2010, Comput. Geosci..

[157]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[158]  Beth Plale,et al.  Temporal representation for scientific data provenance , 2012, 2012 IEEE 8th International Conference on E-Science.

[159]  James Cheney,et al.  Provenance in databases , 2009, SIGMOD '07.

[160]  Charu C. Aggarwal,et al.  Xproj: a framework for projected structural clustering of xml documents , 2007, KDD '07.

[161]  Joonsoo Bae,et al.  Workflow Clustering Method Based on Process Similarity , 2006, ICCSA.

[162]  Boon Thau Loo,et al.  Provenance-aware secure networks , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[163]  Robert B. Ross,et al.  FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems , 2014, 2014 IEEE International Conference on Big Data (Big Data).