A Generic Architecture for Scalable and Highly Available Content Serving Applications in the Cloud

The cloud computing paradigm allows service providers to offer scalable and highly available applications to their end users. Typical cases where this is required are content serving applications, where a large number of connected users manage arbitrary data amounts. In the Big Data era, where the amount of information that is being produced and consumed grows exponentially, centralized legacy approaches are inefficient, as they cannot adequately scale according to the number of connected users or the dataset sizes. In these cases, an efficient cloudification of content serving applications is required in order to benefit from the cloud's offerings. In this work, we present a generic architecture that can be used by almost any content serving application in order to offer scalable and highly available data management operations to their users by employing cloud management techniques. We describe the architectural blocks of our approach along with how they can be efficiently deployed in a cloud environment. We document our experiences with an actual deployment of a typical content serving application over ~okeanos, an Openstack compatible public cloud service. We describe the open source frameworks that we have selected from a plethora of existing tools, we justify our choices and we describe our initial observations during their operation. We give a detailed overview of how we installed and configured these systems to achieve high availability and scalability in a public cloud setting. Finally, we document our initial performance evaluation where we showcase the system's ability to handle increasing workloads by elastically scaling its resources.

[1]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[2]  Michele Stecca,et al.  Sticky Session Support in Auto Scaling IaaS Systems , 2011, 2011 IEEE World Congress on Services.

[3]  Sven Laumer,et al.  Enterprise Content Management , 2013, Bus. Inf. Syst. Eng..

[4]  Nectarios Koziris,et al.  ~okeanos: Building a Cloud, Cluster by Cluster , 2013, IEEE Internet Computing.

[5]  Rhonda S. Lunemann,et al.  Managing Enterprise Content: a Unified Content Strategy , 2003 .

[6]  Clive Thompson,et al.  Smarter Than You Think: How Technology is Changing Our Minds for the Better , 2013 .

[7]  Dan Sanderson Programming Google App Engine , 2012 .

[8]  Michael Hüttermann DevOps for Developers , 2012, Apress.

[9]  Gobinda G. Chowdhury,et al.  Introduction to Digital Libraries , 2002 .

[10]  Todd Tomlinson,et al.  Pro Drupal Development , 2007 .

[11]  Ioannis Konstantinou,et al.  Public vs private cloud usage costs: the StratusLab case , 2012, CloudCP '12.

[12]  Adrian Holovaty,et al.  The Definitive Guide to Django: Web Development Done Right, Second Edition , 2009 .

[13]  Bruce M. Maggs,et al.  Cutting the electric bill for internet-scale systems , 2009, SIGCOMM '09.

[14]  Todd Tomlinson,et al.  Pro Drupal 7 Development , 2007 .

[15]  Ioannis Konstantinou,et al.  Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[16]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .