Big data analytics tools are steadily gaining ground at becoming indispensable to businesses worldwide. The complexity of the tasks they execute is ever increasing due to the surge in data and task heterogeneity. Current analytics platforms, while successful in harnessing multiple aspects of this ``data deluge", bind their efficacy to a single data and compute model and often depend on proprietary systems. However, no single execution engine is suitable for all types of computation and no single data store is suitable for all types of data. To this end, we demonstrate IReS, the Intelligent Resource Scheduler for complex analytics workflows executed over multi-engine environments. Our system models the cost and performance of the required tasks over the available platforms. IReS is then able to match distinct workflow parts to the execution and/or storage engine among the available ones in order to optimize with respect to a user-defined policy. During the demo, the attendees will be able to execute workflows that match real use cases and parametrize the input datasets and optimization policy. The underlying platform supports multiple compute and data engines, allowing the user to choose any subset of them. Through the inspection of the produced plan, its execution and the collection and presentation of numerous cost and performance metrics, the audience will experience first-hand how IReS takes advantage of heterogeneous runtimes and data stores and effectively models operator cost and performance for actual and diverse workflows.
[1]
Hairong Kuang,et al.
The Hadoop Distributed File System
,
2010,
2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[2]
Shivnath Babu,et al.
Towards automatic optimization of MapReduce programs
,
2010,
SoCC '10.
[3]
Herodotos Herodotou,et al.
Stubby: A Transformation-based Optimizer for MapReduce Workflows
,
2012,
Proc. VLDB Endow..
[4]
Kevin Wilkinson,et al.
xPAD: a platform for analytic data flows
,
2013,
SIGMOD '13.
[5]
Kevin Wilkinson,et al.
HFMS: Managing the lifecycle and complexity of hybrid analytic data flows
,
2013,
2013 IEEE 29th International Conference on Data Engineering (ICDE).
[6]
Chita R. Das,et al.
HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers
,
2013,
2013 IEEE 33rd International Conference on Distributed Computing Systems.
[7]
Dimitrios Tsoumakos,et al.
The Case for Multi-Engine Data Analytics
,
2013,
Euro-Par Workshops.
[8]
Boon Thau Loo,et al.
Automated profiling and resource management of pig programs for meeting service level objectives
,
2012,
ICAC '12.
[9]
Liang Dong,et al.
Starfish: A Self-tuning System for Big Data Analytics
,
2011,
CIDR.
[10]
Yaochu Jin,et al.
Surrogate-assisted evolutionary computation: Recent advances and future challenges
,
2011,
Swarm Evol. Comput..