SpeQuloS: a QoS service for hybrid and elastic computing infrastructures

The large choice of Distributed Computing Infrastructures (DCIs) available allows users to select and combine their preferred architectures amongst Clusters, Grids, Clouds, Desktop Grids and more. In these hybrid DCIs, elasticity is emerging as a key property. In elastic infrastructures, resources available to execute application continuously vary, either because of application requirements or because of constraints on the infrastructure, such as node volatility.In the former case, there is no guarantee that the computing resources will remain available during the entire execution of an application. In this paper, we show that Bag-of-Tasks (BoT) execution on these “Best-Effort” infrastructures suffer from a drop of the task completion rate at the end of the execution.The SpeQuloS service presented in this paper improves the Quality of Service (QoS) of BoT applications executed on hybrid and elastic infrastructures. SpeQuloS monitors the execution of the BoT, and dynamically supplies fast and reliable Cloud resources when the critical part of the BoT is executed. SpeQuloS offers several features to hybrid DCIs users, such as estimating completion time and execution speedup. Performance evaluation shows that BoT executions can be accelerated by a factor 2, while offloading less than 2.5 % of the workload to the Cloud.We report on several scenarios where SpeQuloS is deployed on hybrid infrastructures featuring a large variety of infrastructures combinations. In the context of the European Desktop Grid Initiative (EDGI), SpeQuloS is operated to improve QoS of Desktop Grids using resources from private Clouds. We present a use case where SpeQuloS uses both EC2 regular and spot instances to decrease the cost of computation while preserving a similar QoS level. Finally, in the last scenario SpeQuloS allows to optimize Grid5000 resources utilization.

[1]  Dan Geiger,et al.  Exact genetic linkage computations for general pedigrees , 2002, ISMB.

[2]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[3]  Matei Ripeanu,et al.  Amazon S3 for science grids: a viable solution? , 2008, DADC '08.

[4]  Jean-Marc Vincent,et al.  Mining for statistical models of availability in large-scale distributed systems: An empirical study of SETI@home , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[5]  Franck Cappello,et al.  Cost-benefit analysis of Cloud Computing versus desktop grids , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[6]  Chuliang Weng,et al.  Heuristic scheduling for bag-of-tasks applications in combination with QoS in the computational grid , 2005, Future Gener. Comput. Syst..

[7]  Alexandru Iosup,et al.  The Grid Workloads Archive , 2008, Future Gener. Comput. Syst..

[8]  Richard Wolski,et al.  Fault-aware scheduling for Bag-of-Tasks applications on Desktop Grids , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[9]  Wilfred Pinfold,et al.  Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis , 2009, HiPC 2009.

[10]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[11]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[12]  Paul Marshall,et al.  Improving Utilization of Infrastructure Clouds , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[13]  Nazareno Andrade,et al.  OurGrid: An Approach to Easily Assemble Grids with Equitable Resource Sharing , 2003, JSSPP.

[14]  Gilles Fedak,et al.  SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures , 2012, HPDC '12.

[15]  David P. Anderson,et al.  Correlated Resource Models of Internet End Hosts , 2010, 2011 31st International Conference on Distributed Computing Systems.

[16]  CremonesiPaolo Parallel, distributed and network-based processing , 2006 .

[17]  Assaf Schuster,et al.  GridBot: execution of bags of tasks in multiple grids , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[18]  Gilles Fedak,et al.  XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[19]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[20]  Gilles Fedak,et al.  XtremLab: A System for Characterizing Internet Desktop Grids , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[21]  Michael J. Lewis,et al.  Multi-state grid resource availability characterization , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[22]  Dhabaleswar K. Panda,et al.  QoPS: A QoS Based Scheme for Parallel Job Scheduling , 2003, JSSPP.

[23]  Gilles Fedak,et al.  EDGeS: Bridging EGEE to BOINC and XtremWeb , 2009, Journal of Grid Computing.

[24]  Georges Da Costa,et al.  2005 IEEE International Symposium on Cluster Computing and the Grid , 2005, CCGRID.

[25]  Alexandru Iosup,et al.  ExPERT: Pareto-Efficient Task Replication on Grids and a Cloud , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[26]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[27]  Jano I. van Hemert,et al.  Towards optimising distributed data streaming graphs using parallel streams , 2010, HPDC '10.

[28]  Beatrice Gralton,et al.  Washington DC - USA , 2008 .

[29]  Trilce Estrada,et al.  Modeling Job Lifespan Delays in Volunteer Computing Projects , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[30]  Marty Humphrey,et al.  Auto-scaling to minimize cost and meet application deadlines in cloud workflows , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[31]  Péter Kacsuk,et al.  Workers in the Clouds , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[32]  Eduardo Huedo,et al.  On the use of clouds for grid resource provisioning , 2011, Future Gener. Comput. Syst..

[33]  Tran Ngoc Minh,et al.  Towards a profound analysis of bags-of-tasks in parallel systems and their performance impact , 2011, HPDC '11.

[34]  Alexandru Iosup,et al.  The performance of bags-of-tasks in large-scale distributed systems , 2008, HPDC '08.

[35]  Alexandru Iosup,et al.  The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[36]  Selim G. Akl,et al.  Scheduling Algorithms for Grid Computing: State of the Art and Open Problems , 2006 .

[37]  Richard Wolski,et al.  QBETS: queue bounds estimation from time series , 2007, SIGMETRICS '07.

[38]  Rajkumar Buyya,et al.  The Aneka platform and QoS-driven resource provisioning for elastic applications on hybrid Clouds , 2012, Future Gener. Comput. Syst..

[39]  Scott Lathrop,et al.  Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis , 2011, International Conference on High Performance Computing.

[40]  Paul Marshall,et al.  Elastic Site: Using Clouds to Elastically Extend Site Resources , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[41]  Nazareno Andrade,et al.  Automatic grid assembly by promoting collaboration in peer-to-peer grids , 2007, J. Parallel Distributed Comput..

[42]  Thilo Kielmann,et al.  Bag-of-Tasks Scheduling under Budget Constraints , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[43]  Andrew A. Chien,et al.  Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids , 2004, Proceedings of the ACM/IEEE SC2004 Conference.