Design and implementation of task scheduling strategies for massive remote sensing data processing across multiple data centers

Data intensive applications of remote sensing data processing are more and more widespread resulting from the evolutions in computer and network technologies. Especially, bags‐of‐tasks (BoTs) applications with a mass of sharing input files and directed acyclic graph (DAG) applications with data dependencies in a widely distributed computing environment bring new challenges. In this article, a strategy of partitioning group based on hypergraph (PGH) is introduced to formulate the model of sharing files. Within the PGH algorithm, BoTs applications would be partitioned into several groups to minimize the time of data transferring. We also adopted another scheduling policy, which is called optimized task tree (OTT) strategy to handle the DAG workflow of massive remote sensing data processing with data dependencies. A scheduling queue of DAG tasks would be updated according to the priorities changing. With the help of GridSim simulation environment, we designed the Gridlets within scheduler to test the performance of PGH and OTT. Copyright © 2013 John Wiley & Sons, Ltd.

[1]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[2]  Rajkumar Buyya,et al.  SLA-Based Coordinated Superscheduling Scheme and Performance for Computational Grids , 2006, ArXiv.

[3]  AykanatCevdet,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999 .

[4]  R. Ranjan,et al.  Grid Federation : An Economy Based , Scalable Distributed Resource Management System for Large-Scale Resource Coupling , .

[5]  Fabrício Alves Barbosa da Silva,et al.  A Scheduling Algorithm for Running Bag-of-Tasks Data Mining Applications on the Grid , 2004, Euro-Par.

[6]  Rajiv Ranjan,et al.  MediaWise cloud content orchestrator , 2013, Journal of Internet Services and Applications.

[7]  Francine Berman,et al.  Heuristics for scheduling parameter sweep applications in grid environments , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[8]  Selim G. Akl,et al.  Scheduling Algorithms for Grid Computing: State of the Art and Open Problems , 2006 .

[9]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[10]  Rajkumar Buyya,et al.  A Case for Cooperative and Incentive-Based Coupling of Distributed Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.

[11]  Lizhe Wang,et al.  Virtual workflow system for distributed collaborative scientific applications on Grids , 2011, Comput. Electr. Eng..

[12]  Rajkumar Buyya,et al.  Using the GridSim toolkit for enabling Grid computing education , 2002 .

[13]  Cevdet Aykanat,et al.  Iterative-Improvement-Based Heuristics for Adaptive Scheduling of Tasks Sharing Files on Heterogeneous Master-Slave Environments , 2006, IEEE Transactions on Parallel and Distributed Systems.

[14]  Rajiv Ranjan,et al.  CloudGenius: decision support for web server cloud migration , 2012, WWW.

[15]  Vivek Sarkar,et al.  Determining average program execution times and their variance , 1989, PLDI '89.

[16]  Daniel M. Batista,et al.  Scheduling Grid Applications on Clouds , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[17]  Albert Y. Zomaya,et al.  Data Sharing Pattern Aware Scheduling on Grids , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[18]  Rajkumar Buyya,et al.  SLA-Based Coordinated Superscheduling Scheme for Computational Grids , 2006, 2006 IEEE International Conference on Cluster Computing.

[19]  Daan Broeder,et al.  A data infrastructure reference model with applications: towards realization of a ScienceTube vision with a data replication service , 2013, Journal of Internet Services and Applications.

[20]  Howard Jay Siegel,et al.  Task execution time modeling for heterogeneous computing systems , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[21]  Lizhe Wang,et al.  Resource management of distributed virtual machines , 2012, Int. J. Ad Hoc Ubiquitous Comput..

[22]  Rajkumar Buyya,et al.  Grid - federation: A resource management model for cooperative federa - tion of distributed clusters , 2004 .

[23]  Rajkumar Buyya,et al.  Coordinated load management in Peer-to-Peer coupled federated grid systems , 2012, The Journal of Supercomputing.

[24]  Torben Hagerup,et al.  Allocating Independent Tasks to Parallel Processors: An Experimental Study , 1996, J. Parallel Distributed Comput..

[25]  Joel H. Saltz,et al.  A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O , 2005, CCGRID.

[26]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[27]  Miron Livny,et al.  Adaptive Scheduling for Master-Worker Applications on the Computational Grid , 2000, GRID.

[28]  Viktor K. Prasanna,et al.  A unified resource scheduling framework for heterogeneous computing environments , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[29]  Lizhe Wang,et al.  Massively Parallel Neural Signal Processing on a Many-Core Platform , 2011, Computing in Science & Engineering.

[30]  Mehmet Balman,et al.  Distributed data management with PetaShare , 2008, Mardi Gras Conference.

[31]  Daniel S. Katz,et al.  Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities, Baton Rouge, Louisiana, USA, Janua , 2008, Mardi Gras Conference.

[32]  Gregor von Laszewski,et al.  Towards building a cloud for scientific applications , 2011, Adv. Eng. Softw..

[33]  Yves Robert,et al.  Scheduling tasks sharing files on heterogeneous master-slave platforms , 2004, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings..

[34]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[35]  Mark Silberstein,et al.  An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures , 2011, SYSTOR '11.

[36]  Rajkumar Buyya,et al.  A taxonomy and survey on autonomic management of applications in grid computing environments , 2011 .

[37]  Lizhe Wang,et al.  Towards building a multi‐datacenter infrastructure for massive remote sensing image processing , 2013, Concurr. Comput. Pract. Exp..

[38]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[39]  Mehmet Balman,et al.  A new paradigm: Data-aware scheduling in grid computing , 2009, Future Gener. Comput. Syst..

[40]  Rajkumar Buyya,et al.  Constructing A Grid Simulation with Differentiated Network Service Using GridSim , 2005, International Conference on Internet Computing.

[41]  Lizhe Wang,et al.  Task Scheduling of Massive Spatial Data Processing across Distributed Data Centers: What's New? , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[42]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.