Optimizing, Planning and Executing Analytics Workflows over Multiple Engines

Big data analytics have become a necessity to businesses worldwide. The complexity of the tasks they execute is ever increasing due to the surge in data and task heterogeneity. Current analytics platforms, while successful in harnessing multiple aspects of this \data deluge", bind their ecacy to a single data and compute model and often depend on proprietary systems. However, no single execution engine is suitable for all types of computation and no single data store is suitable for all types of data. To this end, we present and demonstrate a platform that designs, optimizes, plans and executes complex analytics workows over multiple engines. Our system enables users to create workows of variable detail concerning the execution semantics, depending on their level of expertise and interest. The workows are then analysed in order to determine missing execution semantics. Through the modelling of the cost and performance of the required tasks over the available platforms, the system is able to match distinct workow parts to the execution and/or storage engine among the available ones in order to optimize with respect to a user-dened policy.