Big Data and HPC Convergence: The Cutting Edge and Outlook

The data growth over the last couple of decades increases on a massive scale. As the volume of the data increases so are the challenges associated with big data. The issues related to avalanche of data being produced are immense and cover variety of challenges that needs a careful consideration. The use of (High Performance Data Analytics) HPDA is increasing at brisk speed in many industries resulted in expansion of HPC market in these new territories. HPC and Big data are different systems, not only at the technical level, but also have different ecosystems. The world of workload is diverse enough and performance sensitivity is high enough that, we cannot have globally optimal and locally high sub-optimal solutions to all the issues related to convergence of big data and HPC. As we are heading towards exascale systems, the necessary integration of big data and HPC is a current hot topic of research but still at very infant stages. Both systems have different architecture and their integration brings many challenges. The main aim of this paper is to identify the driving forces, challenges, current and future trends associated with the integration of HPC and big data. We also propose architecture of big data and HPC convergence using design patterns.

[1]  Dhabaleswar K. Panda,et al.  Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[2]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[3]  Alex Ramírez,et al.  The low-power architecture approach towards exascale computing , 2011, ScalA '11.

[4]  Avita Katal,et al.  Big data: Issues, challenges, tools and Good practices , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[5]  Dhabaleswar K. Panda,et al.  Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? , 2013, 2013 IEEE 21st Annual Symposium on High-Performance Interconnects.

[6]  Ravinder Kaur,et al.  Hadoop: Addressing challenges of Big Data , 2014, 2014 IEEE International Advance Computing Conference (IACC).

[7]  Franck Cappello,et al.  Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..

[8]  Cong Xu,et al.  Assessing the Performance Impact of High-Speed Interconnects on MapReduce , 2012, WBDB.

[9]  Jeffrey S. Chase,et al.  Making Scheduling "Cool": Temperature-Aware Workload Placement in Data Centers , 2005, USENIX Annual Technical Conference, General Track.

[10]  Didier El Baz,et al.  IoT and the Need for High Performance Computing , 2014, 2014 International Conference on Identification, Information and Knowledge in the Internet of Things.

[11]  Umesh Bellur,et al.  An Empirical Study of Hadoop's Energy Efficiency on a HPC Cluster , 2014, ICCS.

[12]  Pradip K. Srimani,et al.  Big data analytics on traditional HPC infrastructure using two-level storage , 2015, DISCS '15.

[13]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[14]  Al Geist,et al.  Major Computer Science Challenges At Exascale , 2009, Int. J. High Perform. Comput. Appl..