Exploiting Computational Resources in Distributed Heterogeneous Platforms

We have been witnessing a continuous growth of both heterogeneous computational platforms (e.g., Cell blades, or the joint use of traditional CPUs and GPUs) and multi- core processor architecture; and it is still an open question how applications can fully exploit such computational potential efficiently. In this paper we introduce a run-time environment and programming framework which supports the implementation of scalable and efficient parallel applications in such heterogeneous, distributed environments. We assess these issues through well-known kernels and actual applications that behave regularly and irregularly, which are not only relevant but also demanding in terms of computation and I/O. Moreover, the irregularity of these, as well as many other applications poses a challenge to the design and implementation of efficient parallel algorithms. Our experimental environment includes dual and octa-core machines augmented with GPUs and we evaluate our framework performance for standalone and distributed executions. The evaluation on a distributed environment has shown near to linear scale-ups for two data mining applications, while the applications performance, when using CPU and GPU, has been improved into around 25%, compared to the GPU-only versions.

[1]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.

[2]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[5]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Bingsheng He,et al.  Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.

[7]  J. Kulpa,et al.  Time-frequency analysis using NVIDIA compute unified device architecture (CUDA) , 2009, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[8]  Lúcia Maria de A. Drummond,et al.  Anthill: a scalable run-time environment for data mining applications , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[9]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[10]  Wagner Meira,et al.  Achieving Multi-Level Parallelism in the Filter-Labeled Stream Programming Model , 2008, 2008 37th International Conference on Parallel Processing.

[11]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.