CONFLuEnCE: Implementation and application design

Data streams have become pervasive and data production rates are increasing exponentially, driven by advances in technology, for example the proliferation of sensors, smart phones, and their applications. This fact effectuates an unprecedented opportunity to build real-time monitoring and analytics applications, which when used collaboratively and interactively, will provide insights to every aspect of our environment, both in the business and scientific domains. In our previous work, we have identified the need for workflow management systems which are capable of orchestrating the processing of multiple heterogeneous data streams, while enabling their users to interact collaboratively with the workflows in real time. In this paper, we describe CONFLuEnCE (CONtinuous workFLow ExeCution Engine), which is an implementation of our continuous workflow model. CONFLuEnCE is built on top of Kepler, an existing workflow management system, by fusing stream semantics and stream processing methods as another computational domain. Furthermore, we explicate our experiences in designing and implementing real-life business and scientific continuous workflow monitoring applications, which attest to the ease of use and applicability of our system.

[1]  Chao Tian,et al.  Nova: continuous Pig/Hadoop workflows , 2011, SIGMOD '11.

[2]  Alexandros Labrinidis,et al.  Towards Continuous Workflow Enactment Systems , 2008, CollaborateCom.

[3]  Dirk Riehle,et al.  Understanding and Using Patterns in Software Development , 1996, Theory Pract. Object Syst..

[4]  Roger S. Barga,et al.  A 2020 vision for ocean science , 2009, The Fourth Paradigm.

[5]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[7]  Edward A. Lee,et al.  Taming heterogeneity - the Ptolemy approach , 2003, Proc. IEEE.

[8]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[9]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2003, Distributed and Parallel Databases.

[10]  Kirk Pruhs,et al.  Algorithms and metrics for processing multiple heterogeneous continuous queries , 2008, TODS.

[11]  Wil M. P. van der Aalst,et al.  Advanced Workflow Patterns , 2000, CoopIS.

[12]  Christopher M. Bishop,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[13]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[14]  Mathias Weske,et al.  Scientific Workflows: Business as Usual? , 2009, BPM.

[15]  K. N. Dollman,et al.  - 1 , 1743 .

[16]  Sara Migliorini,et al.  Pattern-Based Evaluation of Scientific Workflow Management Systems , 2011 .

[17]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[18]  Alexandros Labrinidis,et al.  CONFLuEnCE: CONtinuous workFLow ExeCution Engine , 2011, SIGMOD '11.

[19]  William A. Ruh,et al.  Enterprise Application Integration: A Wiley Tech Brief , 2000 .

[20]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .