A Heuristic Scheduling Approach to Hybrid Makepsan Problem for Data Intensive Computing

Data intensive computing (DIC) offers an attractive option for business to remotely execute applications and load the computing resources from cloud in a streaming way. A key challenge in such environment is to increase the utilization of cloud cluster for the high throughput processing. One way of achieving this goal is to optimize the execution of computing jobs on the cluster. We observe that the order in which these jobs are executed can have a significant impact on their overall completion time (makespan). Our goal is to design a job scheduler that minimizes the makespan. In this study, a new formalization is introduced to present each job as a pair of disk processing and network transmitting two-stage durations. Due to the streaming processing feature, the two-stage operations are executed in an overlap manner and may lead to both one-stage and two-stage scheduling situations. A novel heuristic scheduling strategy is proposed for this hybrid scheduling problem, and the performance of the method is confirmed by the experimental evaluation.

[1]  Taho Yang,et al.  Scheduling two-stage hybrid flow shops with parallel batch, release time, and machine eligibility constraints , 2012, J. Intell. Manuf..

[2]  Min Liu,et al.  A High Performing Memetic Algorithm for the Flowshop Scheduling Problem With Blocking , 2013, IEEE Transactions on Automation Science and Engineering.

[3]  Heinrich Kuhn,et al.  A taxonomy of flexible flow line scheduling procedures , 2007, Eur. J. Oper. Res..

[4]  Shih-Wei Lin,et al.  Multiprocessor task scheduling in multistage hybrid flow-shops: an ant colony system approach , 2006 .

[5]  Tulika Mitra,et al.  Task Scheduling on Adaptive Multi-Core , 2014, IEEE Transactions on Computers.

[6]  Joseph Y.-T. Leung,et al.  Minimizing sum of completion times and makespan in master-slave systems , 2006, IEEE Transactions on Computers.

[7]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[8]  Hamid Arabnejad,et al.  List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table , 2014, IEEE Transactions on Parallel and Distributed Systems.

[9]  Xiaoping Li,et al.  An Effective Meta-Heuristic for No-Wait Job Shops to Minimize Makespan , 2012, IEEE Transactions on Automation Science and Engineering.

[10]  H. Kopka,et al.  Guide to LaTeX , 1999 .

[11]  Gagan Agrawal,et al.  A Framework for Data-Intensive Computing with Cloud Bursting , 2011, CLUSTER.