BabbleFlow: a translator for analytic data flow programs

A complex analytic data flow may perform multiple, inter-dependent tasks where each task uses a different processing engine. Such a multi-engine flow, termed a hybrid flow, may comprise subflows written in more than one programming language. However, as the number and variety of these engines grow, developing and maintaining hybrid flows at the physical level becomes increasingly challenging. To address this problem, we present BabbleFlow, a system for enabling flow design at a logical level and automatic translation to physical flows. BabbleFlow translates a hybrid flow expressed in a number of languages to a semantically equivalent hybrid flow expressed in the same or a different set of languages. To this end, it composes the multiple physical flows of a hybrid flow into a single logical representation expressed in a unified flow language called xLM. In doing so, it enables a number of graph transformations such as (de-)composition and optimization. Then, it converts the, possibly transformed, xLM data flow graph into an executable form by expressing it in one or more target programming languages.

[1]  Kevin Wilkinson,et al.  Engine independence for logical analytic flows , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[2]  Kevin Wilkinson,et al.  Hybrid Analytic Flows - the Case for Optimization , 2013, Fundam. Informaticae.

[3]  Kevin Wilkinson,et al.  HFMS: Managing the lifecycle and complexity of hybrid analytic data flows , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[4]  Kevin Wilkinson,et al.  xPAD: a platform for analytic data flows , 2013, SIGMOD '13.