PonD: dynamic creation of HTC pool on demand using a decentralized resource discovery system

High Throughput Computing (HTC) platforms aggregate heterogeneous resources to provide vast amounts of computing power over a long period of time. Typical HTC systems, such as Condor and BOINC, rely on central managers for resource discovery and scheduling. While this approach simplifies deployment, it requires careful system configuration and management to ensure high availability and scalability. In this paper, we present a novel approach that integrates a self-organizing P2P overlay for scalable and timely discovery of resources with unmodified client/server job scheduling middleware in order to create HTC virtual resource Pools on Demand (PonD). This approach decouples resource discovery and scheduling from job execution/monitoring - a job submission dynamically generates an HTC platform based upon resources discovered through match-making from a large "sea" of resources in the P2P overlay and forms a "PonD" capable of leveraging unmodified HTC middleware for job execution and monitoring. We show that job scheduling time of our approach scales with O(log N), where N is the number of resources in a pool, through first-order analytical models and large-scale simulation results. To verify the practicality of PonD, we have implemented a prototype using Condor (called C-PonD), a structured P2P overlay, and a PonD creation module. Experimental results with the prototype in two WAN environments (PlanetLab and the FutureGrid cloud computing testbed) demonstrates the utility of C-PonD as a HTC approach without relying on a central repository for maintaining all resource information. Though the prototype is based on Condor, the decoupled nature of the system components - decentralized resource discovery, PonD creation, job execution/monitoring - is generally applicable to other grid computing middleware systems.

[1]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[2]  Bobby Bhattacharjee,et al.  Matching Jobs to Resources in Distributed Desktop Grid Environments , 2006 .

[3]  Alexandru Iosup,et al.  Performance analysis of dynamic workflow scheduling in multicluster grids , 2010, HPDC '10.

[4]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[5]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[6]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM '04.

[7]  Igor Sfiligoi,et al.  glideinWMS - A generic pilot-based Workload Management System , 2008 .

[8]  Ian Foster,et al.  On Fully Decentralized Resource Discovery in Grid Environments , 2001, GRID.

[9]  David Wolinsky,et al.  Archer: A Community Distributed Computing Infrastructure for Computer Architecture Research and Education , 2008, CollaborateCom.

[10]  Mohamed Jemni,et al.  BonjourGrid: Orchestration of multi-instances of grid middlewares on institutional Desktop Grids , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[11]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[12]  Alexandru Iosup,et al.  The Grid Workloads Archive , 2008, Future Gener. Comput. Syst..

[13]  Mohamed Jemni,et al.  Controlling processing usage at user level: a way to make resource sharing more flexible , 2010 .

[14]  Jiannong Cao,et al.  Efficient Range Query Processing in Peer-to-Peer Systems , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Miguel Castro,et al.  SCRIBE: The Design of a Large-Scale Event Notification Infrastructure , 2001, Networked Group Communication.

[17]  Gilles Fedak,et al.  XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[18]  Warren Smith,et al.  Design of the FutureGrid experiment management framework , 2010, 2010 Gateway Computing Environments Workshop (GCE).

[19]  Nazareno Andrade,et al.  OurGrid: An Approach to Easily Assemble Grids with Equitable Resource Sharing , 2003, JSSPP.

[20]  David E. Culler,et al.  PlanetLab: an overlay testbed for broad-coverage services , 2003, CCRV.

[21]  Amin Vahdat,et al.  Design and implementation tradeoffs for wide-area resource discovery , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[22]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[23]  Edward Walker,et al.  Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment , 2006, 2006 IEEE Challenges of Large Applications in Distributed Environments.

[24]  Peter Merz,et al.  A Middleware for Job Distribution in Peer-to-Peer Networks , 2006, PARA.

[25]  Pierre St. Juste,et al.  On the design of scalable, self-configuring virtual networks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[26]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[27]  Manish Parashar,et al.  Squid: Enabling search in DHT-based systems , 2008, J. Parallel Distributed Comput..

[28]  P. Oscar Boykin,et al.  IP over P2P: enabling self-configuring virtual IP networks for grid computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[29]  Miron Livny,et al.  An update on the scalability limits of the Condor batch system , 2011 .

[30]  Y. Charlie Hu,et al.  A Self-Organizing Flock of Condors , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[31]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[32]  Miron Livny,et al.  A worldwide flock of Condors: Load sharing among workstation clusters , 1996, Future Gener. Comput. Syst..

[33]  Jon M. Kleinberg,et al.  Navigation in a small world , 2000, Nature.

[34]  Unai Arronategui,et al.  A Highly Scalable Decentralized Scheduler of Tasks with Deadlines , 2011, 2011 IEEE/ACM 12th International Conference on Grid Computing.

[35]  Jogesh K. Muppala,et al.  Resource Discovery and Scheduling in Unstructured Peer-to-Peer Desktop Grids , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[36]  David Wolinsky,et al.  Parallel Processing Framework on a P2P System Using Map and Reduce Primitives , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[37]  Vladimir M. Vishnevsky,et al.  Scalable blind search and broadcasting over Distributed Hash Tables , 2008, Comput. Commun..

[38]  Walfredo Cirne,et al.  MyGrid – A complete solution for running Bag-of-Tasks Applications , 2004 .

[39]  Bobby Bhattacharjee,et al.  Using content-addressable networks for load balancing in desktop grids , 2007, HPDC '07.

[40]  P. Oscar Boykin,et al.  Deetoo: Scalable unstructured search built on a structured overlay , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[41]  Artur Andrzejak,et al.  Scalable, efficient range queries for grid information services , 2002, Proceedings. Second International Conference on Peer-to-Peer Computing,.

[42]  P. Oscar Boykin,et al.  A Symphony Conducted by Brunet , 2007, ArXiv.