Rescheduling and checkpointing as strategies to run synchronous parallel programs on P2P desktop grids

Today, BSP (Bulk-Synchronous Parallel) represents one of the most often used models for writing tightly-coupled parallel programs. As resource substrates, commonly clusters and eventually computational grids are used to run BSP applications. In this context, here we investigate the use of collaborative computing and idle resources to execute this kind of demand, so we are proposing a model named BSPonP2P to answer the following question: How can we develop an efficient and viable model to run BSP applications on P2P Desktop Grids? We answer it by providing both process rescheduling and checkpointing to deal with dynamism at application and infrastructure levels and resource heterogeneity. The results concern a prototype that ran over a subset of the Grid5000, showing encouraging results on using collaboration and volatile resources for HPC.

[1]  Azzedine Boukerche,et al.  An architecture for P2P bag-of-tasks execution with multiple task allocation policies in desktop grids , 2011, Cluster Computing.

[2]  Cristiano André da Costa,et al.  On the replacement of objects from round‐based applications over heterogeneous environments , 2015, Softw. Pract. Exp..

[3]  Tadashi Miyosawa,et al.  Improving routing load balance on Chord , 2014, 16th International Conference on Advanced Communication Technology.

[4]  Richard M. Karp,et al.  Load balancing in dynamic structured P2P systems , 2004, IEEE INFOCOM 2004.

[5]  Bruce Hendrickson Computational science: Emerging opportunities and challenges , 2009 .

[6]  Xiaolin Li,et al.  A taxonomy of peer-to-peer desktop grid paradigms , 2011, Cluster Computing.

[7]  Ajoy Kumar Datta,et al.  A Semantic Overlay for Self- Peer-to-Peer Publish/Subscribe , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[8]  Kashif Hesham Khan,et al.  An efficient grid scheduling strategy for data parallel applications , 2014, The Journal of Supercomputing.

[9]  Kazuyuki Shudo,et al.  P3: P2P-based middleware enabling transfer and aggregation of computational resources , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[10]  Francesc Solsona,et al.  DisCoP2P: an efficient P2P computing overlay , 2013, The Journal of Supercomputing.

[11]  Hee Yong Youn,et al.  Prediction-Based Dynamic Load Balancing Using Agent Migration for Multi-agent System , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[12]  Kam-Wing Ng,et al.  On the Effectiveness of Migration-based Load Balancing Strategies in DHT Systems , 2006, Proceedings of 15th International Conference on Computer Communications and Networks.

[13]  Ioana Banicescu,et al.  Towards the Scalability of Dynamic Loop Scheduling Techniques via Discrete Event Simulation , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[14]  Claudio Schepke,et al.  Performance Improvement of the Parallel Lattice Boltzmann Method Through Blocked Data Distributions , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[15]  Fabio Kon,et al.  Reliable management of checkpointing and application data in opportunistic grids , 2010, Journal of the Brazilian Computer Society.

[16]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.