A cloud middleware for assuring performance and high availability of soft real-time applications

Applications are increasingly being deployed in the cloud due to benefits stemming from economy of scale, scalability, flexibility and utility-based pricing model. Although most cloud-based applications have hitherto been enterprise-style, there is an emerging need for hosting real-time streaming applications in the cloud that demand both high availability and low latency. Contemporary cloud computing research has seldom focused on solutions that provide both high availability and real-time assurance to these applications in a way that also optimizes resource consumption in data centers, which is a key consideration for cloud providers. This paper makes three contributions to address this dual challenge. First, it describes an architecture for a fault-tolerant framework that can be used to automatically deploy replicas of virtual machines in data centers in a way that optimizes resources while assuring availability and responsiveness. Second, it describes the design of a pluggable framework within the fault-tolerant architecture that enables plugging in different placement algorithms for VM replica deployment. Third, it illustrates the design of a framework for real-time dissemination of resource utilization information using a real-time publish/subscribe framework, which is required by the replica selection and placement framework. Experimental results using a case study that involves a specific replica placement algorithm are presented to evaluate the effectiveness of our architecture.

[1]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[2]  Rajkumar Buyya,et al.  Energy Efficient Allocation of Virtual Machines in Cloud Data Centers , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[3]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[4]  J. O. Berkey,et al.  Two-Dimensional Finite Bin-Packing Algorithms , 1987 .

[5]  Marisol García-Valls,et al.  iLAND: An Enhanced Middleware for Real-Time Reconfiguration of Service Oriented Distributed Real-Time Systems , 2013, IEEE Transactions on Industrial Informatics.

[6]  Angelo CORSARO,et al.  Quality of service in publish/subscribe middleware , 2012 .

[7]  Michael Porterfield Office of the Chief Information Officer , 2016 .

[8]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[9]  Ganesh Venkitachalam,et al.  The design of a practical system for fault-tolerant virtual machines , 2010, OPSR.

[10]  Douglas C. Schmidt,et al.  Addressing the challenges of mission-critical information management in next-generation net-centric pub/sub systems with OpenSplice DDS , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[11]  Rina Panigrahy,et al.  Validating Heuristics for Virtual Machines Consolidation , 2011 .

[12]  李青,et al.  Virtual resource monitoring in cloud computing , 2011 .

[13]  Wolfgang Barth,et al.  Nagios: System and Network Monitoring , 2006 .

[14]  Gareth A. Taylor,et al.  A Study of Publish/Subscribe Systems for Real-Time Grid Monitoring , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[15]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[16]  Brian J. Watson,et al.  Autonomic Virtual Machine Placement in the Data Center , 2008 .

[17]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[18]  Ravi Iyer,et al.  Modeling virtual machine performance: challenges and approaches , 2010, PERV.

[19]  Carlos Becker Westphall,et al.  Toward an architecture for monitoring private clouds , 2011, IEEE Communications Magazine.

[20]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[21]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[22]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[23]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[24]  Teresa M Takai Cloud Computing Strategy , 2012 .

[25]  Erik Wilde,et al.  A resource oriented architecture for the Web of Things , 2010, 2010 Internet of Things (IOT).

[26]  Calton Pu,et al.  Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[27]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[28]  Antonio Corradi,et al.  DDS-enabled Cloud management support for fast task offloading , 2012, 2012 IEEE Symposium on Computers and Communications (ISCC).

[29]  田村 芳明,et al.  Kemari: Virtual Machine Synchronization for Fault Tolerance , 2010 .

[30]  Chenyang Lu,et al.  RT-Xen: Towards real-time hypervisor scheduling in Xen , 2011, 2011 Proceedings of the Ninth ACM International Conference on Embedded Software (EMSOFT).

[31]  Marisol Garcia Valls,et al.  iLAND: An Enhanced Middleware for Real-Time Reconfiguration of Service Oriented Distributed Real-Time Systems , 2013 .

[32]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[33]  K. Shin,et al.  HydraVM : Low-Cost , Transparent High Availability for Virt ual Machines , 2011 .