A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O

This paper proposes a novel, hypergraph partitioning based strategy to schedule multiple data analysis tasks with batch-shared I/O behavior. This strategy formulates the sharing of files among tasks as a hypergraph to minimize the I/O overheads due to transferring of the same set of files multiple times and employs a dynamic scheme for file transfers to reduce contention on the storage system. We experimentally evaluate the proposed approach using application emulators from two application domains; analysis of remotely-sensed data and biomedical imaging.

[1]  Francine Berman,et al.  Heuristics for scheduling parameter sweep applications in grid environments , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[2]  Yves Robert,et al.  Scheduling Tasks Sharing Files from Distributed Repositories , 2004, Euro-Par.

[3]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[4]  Cevdet Aykanat,et al.  Iterative-Improvement-Based Heuristics for Adaptive Scheduling of Tasks Sharing Files on Heterogeneous Master-Slave Environments , 2006, IEEE Transactions on Parallel and Distributed Systems.

[5]  Vivek Sarkar,et al.  Determining average program execution times and their variance , 1989, PLDI '89.

[6]  Joel H. Saltz,et al.  A Hypergraph-Based Workload Partitioning Strategy for Parallel Data Aggregation , 2001, PPSC.

[7]  Pete Wyckoff,et al.  An Evaluation of the OSC FAStT 600 Turbo Storage Pool , 2004 .

[8]  Francine Berman,et al.  The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[9]  Dennis Gannon,et al.  High Performance Fortran interface to the parallel C++ , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[10]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[11]  Joel H. Saltz,et al.  Scheduling multiple data visualization query workloads on a shared memory machine , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[12]  David J. DeWitt,et al.  Batch scheduling in parallel database systems , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[13]  Joel H. Saltz,et al.  Titan: a high-performance remote-sensing database , 1997, Proceedings 13th International Conference on Data Engineering.

[14]  Joel H. Saltz,et al.  A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines , 1998, LCR.

[15]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[16]  Ravi Jain,et al.  Heuristics for Scheduling I/O Operations , 1997, IEEE Trans. Parallel Distributed Syst..

[17]  Oscar H. Ibarra,et al.  Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors , 1977, JACM.

[18]  Yves Robert,et al.  Scheduling tasks sharing files on heterogeneous master-slave platforms , 2004, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings..

[19]  Joel H. Saltz,et al.  Use of PVFS for efficient execution of jobs with pipeline-shared I/O , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[20]  Yves Robert,et al.  Scheduling Tasks Sharing Files on Heterogeneous Master-Slave Platforms , 2004, PDP.

[21]  Andrea C. Arpaci-Dusseau,et al.  Pipeline and batch sharing in grid workloads , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.