论文信息 - Transparent Application Acceleration by Intelligent Scheduling of Shared Library Calls on Heterogeneous Systems

Transparent Application Acceleration by Intelligent Scheduling of Shared Library Calls on Heterogeneous Systems

Transparent application acceleration in heterogeneous systems can be performed by automatically intercepting shared libraries calls and by efficiently orchestrating the execution across all processing devices. To fully exploit the available computing power, the intercepted calls must be replaced with faster accelerator-based implementations and intelligent scheduling algorithms must be incorporated. When compared with previous approaches, the framework herein proposed does not only transparently intercepts and redirects the library calls, but it also incorporates state-of-art scheduling algorithms, for both divisible and indivisible applications. When compared with highly optimized implementations for multi-core CPUs (e.g., MKL and FFTW), the obtained experimental results demonstrate that, by applying appropriate light-weight scheduling and load-balancing mechanisms, performance speedups as high as 7.86 (matrix multiplication) and 4.6 (FFT) can be achieved.

[1] Tobias Beisel,et al. Using shared library interposing for transparent application acceleration in systems with heterogeneous hardware accelerators , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[2] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[3] Zhen Xiao,et al. A flexible generator architecture for improving software dependability , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[4] Alexey L. Lastovetsky,et al. Data Partitioning with a Functional Performance Model of Heterogeneous Processors , 2007, Int. J. High Perform. Comput. Appl..

[5] Alexey L. Lastovetsky,et al. Dynamic Load Balancing of Parallel Computational Iterative Routines on Highly Heterogeneous HPC Platforms , 2011, Parallel Process. Lett..

[6] Nuno Roma,et al. A flexible shared library profiler for early estimation of performance gains in heterogeneous systems , 2013, 2013 International Conference on High Performance Computing & Simulation (HPCS).