Mitigating the Effects of Partial Resource Failures for Cloud Providers

Competition for users on a global market is fierce, forcing enterprises to provide for better, faster services while offering the same more cheaply. At the same time, users choose to remain oblivious of the infrastructure behind the service – only demanding that it works. Cloud service failures and inefficient management of such failures can result in significant financial cost, loss of reputation for providers, and drive key customers away. At the same time failure situations can never be completely avoided. To mitigate their effects we present a decision model for providers to help them decide which jobs to keep running and which to cancel in order to minimize loss of revenue and key customers during partial resource failures. The results of the evaluation of the model and its extension show its ability to significantly improve revenue. Furthermore the model can also help to reduce the number of cancelled jobs.

[1]  G. Chicco,et al.  Comparisons among clustering techniques for electricity customer classification , 2006, IEEE Transactions on Power Systems.

[2]  K. Keahey,et al.  Trading Grid services within the UK e-science Grid , 2004 .

[3]  Jordi Torres,et al.  Building Online Performance Models of Grid Middleware with Fine-Grained Load-Balancing: A Globus Toolkit Case Study , 2007, EPEW.

[4]  Michael A. Rappa,et al.  The utility business model and the future of computing services , 2004, IBM Syst. J..

[5]  Jordi Torres,et al.  Self-adaptive utility-based web session management , 2009, Comput. Networks.

[6]  Kerstin Voß Risk-aware Migrations For Prepossessing SLAs , 2006, International conference on Networking and Services (ICNS'06).

[7]  Chris M. Kenyon,et al.  Grid resource commercialization: economic engineering and delivery scenarios , 2004 .

[8]  Rajkumar Buyya,et al.  Economic-based Distributed Resource Management and Scheduling for Grid Computing , 2002, ArXiv.

[9]  Patrick Martin,et al.  Workload class importance policy in autonomic database management systems , 2006, Seventh IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'06).

[10]  Odej Kao,et al.  Introducing Risk Management into the Grid , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[11]  Jordi Torres,et al.  Should the grid middleware look to self-managing capabilities? , 2007, Eighth International Symposium on Autonomous Decentralized Systems (ISADS'07).

[12]  N. Carr The end of corporate computing , 2005 .

[13]  Klara Nahrstedt,et al.  A distributed resource management architecture that supports advance reservations and co-allocation , 1999, 1999 Seventh International Workshop on Quality of Service. IWQoS'99. (Cat. No.98EX354).

[14]  Jordi Torres,et al.  Web Customer Modeling for Automated Session Prioritization on High Traffic Sites , 2007, User Modeling.