Dependability as a cloud service - a modular approach

Failures of services on cloud platforms are only to be expected. To deal with such failures, one is naturally inclined to use the traditional measure of replication. However, replication of services on distributed cloud platforms poses several challenges that are not well met by today's Java middleware systems. These challenges are the need to isolate state in the application components so that easy migration and recovery are possible and the requirement for client transparency when dealing with different replicated service instances. For example, Java Enterprise Edition (JEE) makes it difficult to have transparent replication of services due to the above two reasons plus the fine-grained nature of interactions between its components (the Enterprise Java Beans). In this paper, we show parts of the design of OSGi, a specification defining a dynamic component system in Java, that make it suitable for the above task. We then propose two extensions to OSGi which will allow exposing and exporting application component state and transparent invocation of service instances. These two together can enable easy replication and recovery from failures in cloud environments. We show through experiments that our prototype can migrate a failed service quickly enough to a new machine so that a client experiences only a moderate increase in service invocation time during system recovery.

[1]  Brian Tierney,et al.  File and Object Replication in Data Grids , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[2]  Gustavo Alonso,et al.  Engineering the cloud from software modules , 2009, 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.

[3]  Peter Kriens How OSGi Changed My Life , 2008, ACM Queue.

[4]  簡聰富,et al.  物件導向軟體之架構(Object-Oriented Software Construction)探討 , 1989 .

[5]  Carl Hewitt,et al.  A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.

[6]  Gustavo Alonso,et al.  Building, deploying, and monitoring distributed applications with Eclipse and R-OSGI , 2007, eclipse '07.

[7]  Jochen Liedtke,et al.  On micro-kernel construction , 1995, SOSP.

[8]  George Candea,et al.  Microreboot - A Technique for Cheap Recovery , 2004, OSDI.

[9]  Gustavo Alonso,et al.  R-OSGi: Distributed Applications Through Software Modularization , 2007, Middleware.

[10]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[11]  Yvonne Coady,et al.  Virtualized recomposition: Cloudy or clear? , 2009, 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.

[12]  Aad P. A. van Moorsel,et al.  Dependability in the cloud: Challenges and opportunities , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[13]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[14]  Jemal H. Abawajy,et al.  Fault-tolerant scheduling policy for grid computing systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..