Map-Reduce based tipping point scheduler for parallel image processing

Abstract Nowadays, Big Data image processing is very much in need due to its proven success in the field of business information system, medical science and social media. However, as the days are passing by, the computation of Big Data images is becoming more complex which ultimately results in complex resource management and higher task execution time. Researchers have been using a combination of CPU and GPU based computing to cut down the execution time, however, when it comes to scaling of compute nodes, then the combination of CPU and GPU based computing still remains a challenge due to the high communication cost factor. In order to tackle this issue, the Map-Reduce framework has come out to be a viable option as its workflow optimization could be enhanced by changing its underlying job scheduling mechanism. This paper presents a comparative study of job scheduling algorithms which could be deployed over various Big Data based image processing application and also proposes a tipping point scheduling algorithm to optimize the workflow for job execution on multiple nodes. The evaluation of the proposed scheduling algorithm is done by implementing parallel image segmentation algorithm to detect lung tumor for up to 3GB size of image dataset. In terms of performance comprising of task execution time and throughput, the proposed tipping point scheduler has come out to be the best scheduler followed by the Map-Reduce based Fair scheduler. The proposed tipping point scheduler is 1.14 times better than Map-Reduce based Fair scheduler and 1.33 times better than Map-Reduced based FIFO scheduler in terms of task execution time and throughput. In terms of speedup comparison between single node and multiple nodes, the proposed tipping point scheduler attained a speedup of 4.5 X for multi-node architecture.

[1]  Chita R. Das,et al.  OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Yun Tian,et al.  Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform , 2018, Comput. Intell. Neurosci..

[3]  Richard E. Korf,et al.  Single-Agent Parallel Window Search , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Hongyan Cui,et al.  Big Data: A Parallel Particle Swarm Optimization-Back-Propagation Neural Network Algorithm Based on MapReduce , 2016, PloS one.

[5]  Samee Ullah Khan,et al.  MapReduce-based fast fuzzy c-means algorithm for large-scale underwater image segmentation , 2016, Future Gener. Comput. Syst..

[6]  Pietro Michiardi,et al.  HFSP: Bringing Size-Based Scheduling To Hadoop , 2017, IEEE Transactions on Cloud Computing.

[7]  Ankit Shah,et al.  Comparative Study of Scheduling Algorithms in Heterogeneous Distributed Computing Systems , 2018 .

[8]  Peter Marwedel,et al.  Parallelism analysis: Precise WCET values for complex multi-core systems , 2014, Sci. Comput. Program..

[9]  T. Kalaiselvi,et al.  Survey of using GPU CUDA programming model in medical image analysis , 2017 .

[10]  Yang Wang,et al.  Budget-Driven Scheduling Algorithms for Batches of MapReduce Jobs in Heterogeneous Clouds , 2014, IEEE Transactions on Cloud Computing.

[11]  Min Wang,et al.  A New Approach for Large-Scale Scene Image Retrieval Based on Improved Parallel -Means Algorithm in MapReduce Environment , 2016 .

[12]  Albert Y. Zomaya,et al.  Heterogeneous Job Allocation Scheduler for Hadoop MapReduce Using Dynamic Grouping Integrated Neighboring Search , 2020, IEEE Transactions on Cloud Computing.

[13]  Vincent Nélis,et al.  A framework for memory contention analysis in multi-core platforms , 2015, Real-Time Systems.

[14]  Muhammad Usman,et al.  Performance efficiency in Hadoop for storing and accessing small files , 2017, 2017 Seventh International Conference on Innovative Computing Technology (INTECH).

[15]  Arun Kumar Sangaiah,et al.  Multi-objective scheduling of MapReduce jobs in big data processing , 2018, Multimedia Tools and Applications.

[16]  R. Saravanan,et al.  MapReduce task scheduling based on deadline constraints —A study , 2017, 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).

[17]  Ciprian Dobre,et al.  MOMTH: multi-objective scheduling algorithm of many tasks in Hadoop , 2015, Cluster Computing.

[18]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[19]  Ponnuthurai N. Suganthan,et al.  Recent advances in differential evolution - An updated survey , 2016, Swarm Evol. Comput..

[20]  Junita Mohamad-Saleh,et al.  Design and simulation of a parallel adaptive arbiter for maximum CPU utilization using multi-core processors , 2015, Comput. Electr. Eng..

[21]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[22]  Jiao Zhang,et al.  A Survey of Coflow Scheduling Schemes for Data Center Networks , 2018, IEEE Communications Magazine.

[23]  Yi Yao,et al.  HaSTE: Hadoop YARN Scheduling Based on Task-Dependency and Resource-Demand , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[24]  Shoney Sebastian,et al.  Comparative study of Job Schedulers in Hadoop Environment , 2017 .

[25]  Jian Hu,et al.  Time-to-Progression of NSCLC from Early to Advanced Stages: An Analysis of data from SEER Registry and a Single Institute , 2016, Scientific Reports.

[26]  Atul Negi,et al.  A data locality based scheduler to enhance MapReduce performance in heterogeneous environments , 2019, Future Gener. Comput. Syst..

[27]  Alan L. Cox,et al.  The Hadoop distributed filesystem: Balancing portability and performance , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[28]  Hong Zhang,et al.  MRapid: An Efficient Short Job Optimizer on Hadoop , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[29]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[30]  唐 斌 Tang Bin,et al.  Fast Canny algorithm based on GPU + CPU , 2016 .

[31]  Min Chen,et al.  Job schedulers for Big data processing in Hadoop environment: testing real-life schedulers using benchmark programs , 2017, Digit. Commun. Networks.

[32]  Robert L. Grossman,et al.  Malstone: towards a benchmark for analytics on large data clouds , 2010, KDD '10.

[33]  M. Kumar,et al.  Tolhit – A Scheduling Algorithm for Hadoop Cluster , 2016 .

[34]  Tullio Vardanega,et al.  Computing Safe Contention Bounds for Multicore Resources with Round-Robin and FIFO Arbitration , 2017, IEEE Transactions on Computers.

[35]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[36]  Aurelle Tchagna Kouanou,et al.  An optimal big data workflow for biomedical image analysis , 2018 .

[37]  Antonio J. Plaza,et al.  GPU implementation of hyperspectral image classification based on weighted Markov random fields , 2016, 2016 8th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS).

[38]  Anirban Basu,et al.  An Analysis of Resource-Aware Adaptive Scheduling for HPC Clusters with Hadoop , 2018 .

[39]  Haiying Shen,et al.  An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements , 2017, IEEE Transactions on Parallel and Distributed Systems.

[40]  Bo Li,et al.  Cluster fair queueing: Speeding up data-parallel jobs with delay guarantees , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[41]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[42]  D. Mollura,et al.  Segmentation and Image Analysis of Abnormal Lungs at CT: Current Approaches, Challenges, and Future Trends. , 2015, Radiographics : a review publication of the Radiological Society of North America, Inc.