Redesigning OP2 Compiler to Use HPX Runtime Asynchronous Techniques
暂无分享,去创建一个
[1] Hartmut Kaiser,et al. HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.
[2] Dietmar Fey,et al. Higher-level parallelization for local and distributed asynchronous task-based programming , 2015, ESPM '15.
[3] G. R. Mudalige,et al. OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures , 2012, 2012 Innovative Parallel Computing (InPar).
[4] Paul H. J. Kelly,et al. Mesh independent loop fusion for unstructured mesh applications , 2012, CF '12.
[5] Jeanine Cook,et al. Using Intrinsic Performance Counters to Assess Efficiency in Task-Based Parallel Applications , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[6] Dietmar Fey,et al. Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers , 2013, ScalA '13.
[7] Lorna Smith. Mixed Mode MPI / OpenMP Programming , 2000 .
[8] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[9] Nancy M. Amato,et al. Run-time methods for parallelizing partially parallel loops , 1995, ICS '95.
[10] Chirag Dekate,et al. Extreme scale parallel NBody algorithm with event driven constraint based execution model , 2011 .
[11] Paul H. J. Kelly,et al. Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures , 2012, Comput. J..
[12] Dietmar Fey,et al. High Performance Computing , 2016, Lecture Notes in Computer Science.
[13] Carlo Bertolli,et al. Designing OP2 for GPU architectures , 2013, J. Parallel Distributed Comput..
[14] Paul H. J. Kelly,et al. Performance analysis of the OP2 framework on many-core architectures , 2011, PERV.
[15] M. Frans Kaashoek,et al. Software prefetching and caching for translation lookaside buffers , 1994, OSDI '94.
[16] J. Ramanujam,et al. Using HPX and OP2 for Improving Parallel Scaling Performance of Unstructured Grid Applications , 2016, 2016 45th International Conference on Parallel Processing Workshops (ICPPW).
[17] Nancy M. Amato,et al. A scalable method for run-time loop parallelization , 1995, International Journal of Parallel Programming.
[18] Michael J. Flynn,et al. Hardware and software cache prefetching techniques for MPEG benchmarks , 2000, IEEE Trans. Circuits Syst. Video Technol..
[19] Thomas Heller,et al. Application of the ParalleX execution model to stencil-based problems , 2012, Computer Science - Research and Development.
[20] Lawrence Rauchwerger,et al. Implementation Issues of Loop-Level Speculative Run-Time Parallelization , 1999, CC.
[21] Brad Calder,et al. Pointer cache assisted prefetching , 2002, MICRO.
[22] Jeanine Cook,et al. The Performance Implication of Task Size for Applications on the HPX Runtime System , 2015, 2015 IEEE International Conference on Cluster Computing.
[23] Donald Yeung,et al. The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems , 2004, J. Instr. Level Parallelism.
[24] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[25] D. Ghate,et al. Using Automatic Differentiation for Adjoint CFD Code Development , 2005 .
[26] Jaejin Lee,et al. Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems , 2009, IEEE Transactions on Parallel and Distributed Systems.
[27] Carl Hewitt,et al. The incremental garbage collection of processes , 1977, Artificial Intelligence and Programming Languages.
[28] Martin Burtscher,et al. Efficient emulation of hardware prefetchers via event-driven helper threading , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[29] Paul H. J. Kelly,et al. Design and Performance of the OP2 Library for Unstructured Mesh Applications , 2011, Euro-Par Workshops.