Horde: A parallel programming framework for clusters

Horde is a general programming framework for writing parallel applications in clusters. A computing task is modeled as a graph in Horde. Each sub-task maps to one vertex and data channels map to edges in the graph. Programming with Horde is very simple by writing sequential code for vertexes and adding edges to link vertexes. Horde can tolerant transient fault and provide support to write code for toleranting permanent faults. Horde is portable and support various cluster job managers. We evaluate Horde's efficiency in communication through micro benchmarks and prove the easy-of-use of Horde by implementing a MapReuce engine. The test in a small scale cluster show that our implementation outperforms Hadoop.

[1]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[4]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[5]  Lorenzo Alvisi,et al.  Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[6]  Thomas Hérault,et al.  MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[7]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[8]  Jiwu Shu,et al.  Parallel algorithm and implementation for realtime dynamic simulation of power system , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[9]  Garrick Staples,et al.  TORQUE resource manager , 2006, SC.

[10]  Franck Cappello,et al.  Coordinated checkpoint versus message log for fault tolerant MPI , 2004, 2003 Proceedings IEEE International Conference on Cluster Computing.

[11]  Andrew Lumsdaine,et al.  A Component Architecture for LAM/MPI , 2003, PVM/MPI.

[12]  Jason Duell,et al.  The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..

[13]  Benjamin Rose,et al.  Supporting MapReduce on large-scale asymmetric multi-core clusters , 2009, OPSR.

[14]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.