Real-Time Fault-Tolerance in Federated Cloud Environments

Dependability is a critical concern in provisioning services in Cloud Computing environments. This is true when considering reliability, an attribute of dependability that is a critical and challenging problem in a Cloud context [2]. Fault-tolerance is one means to attain reliability, and is typically implemented by using some form of diversity. Federated Cloud, which is an emerging Cloud paradigm that orchestrates multiple Clouds, is able to implement environmental diversity for Cloud applications with relative ease and minimal additional cost to the consumer due to its inherent design. Real-Time Applications (RTAs) can benefit from deploying fault-tolerant schemes to fulfill deadlines in the presence of faults as they enable the provisioning of correct service in the event of a component in the application failing. However, this diversity can potentially become an issue when designing dynamically scalable fault-tolerant RTAs in a federated Cloud environment while also fulfilling QoS demands. In particular, building fault-tolerant RTAs by using the diversity of the Virtual Machine (VM) configurations and of the underlying Cloud infrastructure can have a negative impact on the ability to fulfill deadlines whilst still allowing the application to dynamically provision VMs with minimal human interaction. This paper identifies a number of characteristics that affect the ability for a RTA to fulfill specified deadlines in a federated Cloud environment as a result of deploying environment diverse fault-tolerant schemes. Furthermore we have designed and performed initial experiments using a real world Cloud federation to justify the feasibility of this problem. Results demonstrate that deploying RTAs in a federated Cloud environment can potentially increase the rate of deadline violations.

[1]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[2]  Bev Littlewood,et al.  Predictably Dependable Computing Systems , 2012, ESPRIT Basic Research Series.

[3]  Zibin Zheng,et al.  FTCloud: A Component Ranking Framework for Fault-Tolerant Cloud Applications , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[4]  Antonio Puliafito,et al.  Three-Phase Cross-Cloud Federation Model: The Cloud SSO Authentication , 2010, 2010 Second International Conference on Advances in Future Internet.

[5]  Rao Mikkilineni,et al.  Next Generation Cloud Computing Architecture: Enabling Real-Time Dynamism for Shared Distributed Physical Infrastructure , 2010, 2010 19th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises.

[6]  Jean-Claude Laprie,et al.  Dependability — Its Attributes, Impairments and Means , 1995 .

[7]  Ripal Nathuji,et al.  Exploiting Platform Heterogeneity for Power Efficient Data Centers , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[8]  Parameswaran Ramanathan,et al.  Real-time computing: a new discipline of computer science and engineering , 1994, Proc. IEEE.

[9]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[10]  Rajkumar Buyya,et al.  InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services , 2010, ICA3PP.

[11]  Jie Xu,et al.  MoSeS: A Grid-Enabled Spatial Decision Support System , 2009 .

[12]  J.A. Stankovic,et al.  Misconceptions about real-time computing: a serious problem for next-generation systems , 1988, Computer.

[13]  Benny Rochwerger,et al.  Reservoir - When One Cloud Is Not Enough , 2011, Computer.

[14]  Yi-Min Wang,et al.  Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[15]  Robbert van Renesse,et al.  Toward a cloud computing research agenda , 2009, SIGA.

[16]  Insup Lee,et al.  An empirical analysis of scheduling techniques for real-time cloud-based data processing , 2011, 2011 IEEE International Conference on Service-Oriented Computing and Applications (SOCA).

[17]  Paul Ammann,et al.  Data Diversity: An Approach to Software Fault Tolerance , 1988, IEEE Trans. Computers.

[18]  Hossein Deldari,et al.  Job failure prediction in grid environment based on workload characteristics , 2009, 2009 14th International CSI Computer Conference.