Migrating a Digital Library to a Private Cloud

A private cloud deployment of an infrastructure as a service (IaaS) cluster is a cost effective solution to many small and intermediate digital libraries and maybe companies. As a working online digital library search engine, the physical infrastructure of CiteSeerX represents many of the clusters for a typical digital library in terms of size and functionalities. CiteSeerX used to run on a cluster consisting of eighteen loosely coupled physical machines. In this work we share the experiences and lessons learned through migrating CiteSeerX into a private cloud environment using virtualization technique. We also discuss alternative solutions including a public cloud deployment using Amazon EC2 and EBS services. We found that the private cloud via virtualization is a better model for a digital library system like CiteSeerX. We also report system status, activities and proposed variations after the new system has been running for over half a year.

[1]  Tharam S. Dillon,et al.  Cloud Computing: Issues and Challenges , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[2]  Yin Yang,et al.  ABACUS: An Auction-Based Approach to Cloud Service Differentiation , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[3]  C. Lee Giles,et al.  Disambiguating authors in academic publications using random forests , 2009, JCDL '09.

[4]  Juee U. Daryapurkar Cloud Computing: Issues and Challenges , 2014 .

[5]  Alexandru Iosup,et al.  A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing , 2009, CloudComp.

[6]  ProdanRadu,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011 .

[7]  C. Lee Giles,et al.  Scaling SeerSuite in the Cloud , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[8]  Madian Khabsa,et al.  Scalability Bottlenecks of the CiteSeerX Digial Library Search Engine , 2013 .

[9]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  David M Levinson,et al.  Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering , 2009, Complex.

[11]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[12]  Madian Khabsa,et al.  SeerSuite: Developing a Scalable and Reliable Application Framework for Building Digital Libraries by Crawling the Web , 2010, WebApps.

[13]  C. Lee Giles,et al.  Cloud Computing: A Digital Libraries Perspective , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[14]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[15]  Muhammad Ali Babar,et al.  Migrating Service-Oriented System to Cloud Computing: An Experience Report , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[16]  C. Lee Giles,et al.  The evolution of a crawling strategy for an academic document search engine: whitelists and blacklists , 2012, WebSci '12.

[17]  Hakim Weatherspoon,et al.  Cloudifying source code repositories: how much does it cost? , 2010, OPSR.