Constructing a Flexible Internet-Scale Time-Sharing System using Deterministic Checkpointing

Distributed systems and clustering have both grown into mainstream phenomena, now used throughout academia and industry. Despite advances in a wide variety of interconnects, including home Internet connections at speeds and latencies only available in a data center several years ago, the scheduling and use of these systems is still firmly rooted in the era of batch processing. Batch processing provides a basic scheduler and simple programming model for many types of computation, but lacks the flexibility and efficient resource utilization of even the most rudimentary of time-sharing systems. With a large number of use cases now constrained by these limitations, time-sharing concepts must once again come to the rescue. Based on experience from the distributed.net, Folding@home, and Storage@home systems, it can be demonstrated that the Internet has advanced and can now meet the higher requirements for time-sharing. This paper will explore those requirements and the potential benefits of moving beyond batch processing and into time-sharing for Internet-scale computations, and lay out a method for deterministic checkpointing required to implement such a system.

[1]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[2]  Vijay S. Pande,et al.  Folding@home: Lessons from eight years of volunteer distributed computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[3]  Vijay S. Pande,et al.  Folding@Home and Genome@Home: Using distributed computing to tackle previously intractable problem , 2009, 0901.0866.

[4]  Vijay S. Pande,et al.  Storage@home: Petascale Distributed Storage , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[5]  M. P. Russell,et al.  The architecture of a distributed computer system , 1988, [1988] Proceedings. Workshop on the Future Trends of Distributed Computing Systems in the 1990s.

[6]  J. S. Gage The great Internet Mersenne prime search. , 1998, M.D. computing : computers in medical practice.

[7]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.