A Proposed Architecture for Parallel HPC-based Resource Management System for Big Data Applications

Article history: Received: 02 October, 2018 Accepted: 11 January, 2019 Online : 20 January, 2019 Big data can be considered to be at the forefront of the present and future research activities. The volume of data needing to be processed is growing dramatically in both velocity and variety. In response, many big data technologies have emerged to tackle the challenges of collecting, processing and storing such large-scale datasets. Highperformance computing (HPC) is a technology that is used to perform computations as fast as possible. This is achieved by integrating heterogeneous hardware and crafting software and algorithms to exploit the parallelism provided by HPC. The performance capabilities afforded by HPC have made it an attractive environment for supporting scientific workflows and big data computing. This has led to a convergence of the HPC and big data fields. However, big data applications usually do not fully exploit the performance available in HPC clusters. This is so due to such applications being written in high-level programming languages and do not provide support for exploiting parallelism as do other parallel programming models. The objective of this research paper is to enhance the performance of big data applications on HPC clusters without sacrificing the power consumption of HPC. This can be achieved by building a parallel HPC-based Resource Management System to exploit the capabilities of HPC resources efficiently.

[1]  Yanming Shen,et al.  Job-Aware Scheduling for Big Data Processing , 2015, 2015 International Conference on Cloud Computing and Big Data (CCBD).

[2]  M. Abdul Rahman,et al.  Performance Evaluation of Apache Spark Vs MPI: A Practical Case Study on Twitter Sentiment Analysis , 2017, J. Comput. Sci..

[3]  William Saphir,et al.  Job Management Requirements for NAS Parallel Systems and Clusters , 1995, JSSPP.

[4]  Zheguang Zhao,et al.  Bridging the Gap between HPC and Big Data frameworks , 2017, Proc. VLDB Endow..

[5]  M. Anusha,et al.  Big Data-Survey , 2016 .

[6]  J. J. Collins,et al.  An empirical study of data decomposition for software parallelization , 2017, J. Syst. Softw..

[7]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[8]  Frédéric Suter,et al.  One-step algorithm for mixed data and task parallel scheduling without data replication , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[9]  John Shalf,et al.  Trends in Data Locality Abstractions for HPC Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[10]  Pradip K. Srimani,et al.  Big data analytics on traditional HPC infrastructure using two-level storage , 2015, DISCS '15.

[11]  Olivier Richard,et al.  Big data and HPC collocation: Using HPC idle resources for Big Data analytics , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[12]  Sriram Krishnamoorthy,et al.  Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs , 2017, PPoPP.

[13]  Jeremy Kepner,et al.  Scalable System Scheduling for HPC and Big Data , 2017, J. Parallel Distributed Comput..

[14]  Mats Brorsson,et al.  Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors , 2015, Sci. Program..

[15]  Rajkumar Buyya,et al.  Parallel Programming Models and Paradigms , 1998 .

[16]  Barbara M. Chapman,et al.  A Comparative Survey of the HPC and Big Data Paradigms: Analysis and Experiments , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).

[17]  Miguel A. Vega-Rodríguez,et al.  Fattened backfilling: An improved strategy for job scheduling in parallel systems , 2016, J. Parallel Distributed Comput..

[18]  Jun Wang,et al.  DL-MPI: Enabling data locality computation for MPI-based data-intensive applications , 2013, 2013 IEEE International Conference on Big Data.

[19]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..