Using Arm Scalable Vector Extension to Optimize OPEN MPI
暂无分享,去创建一个
George Bosilca | Pavel Shamis | Shinji Sumimoto | Jack Dongarra | Qinglei Cao | Dong Zhong | Kenichi Miura
[1] Mitsuhisa Sato,et al. Preliminary Performance Evaluation of Application Kernels Using ARM SVE with Multiple Vector Lengths , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[2] Kenneth A. Ross,et al. Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.
[3] Jack Dongarra,et al. GPU-Aware Non-contiguous Data Movement In Open MPI , 2016, HPDC.
[4] Armin Kobilica,et al. Simulation of ARM and x86 microprocessors using in-order and out-of-order CPU models with Gem5 simulator , 2018, 2018 5th International Conference on Electrical and Electronic Engineering (ICEEE).
[5] George Bosilca,et al. ADAPT: an event-based adaptive collective communication framework , 2018, HPDC.
[6] Dhabaleswar K. Panda,et al. Zero-Copy MPI Derived Datatype Communication over InfiniBand , 2004, PVM/MPI.
[7] Jesper Larsson Träff. Transparent Neutral Element Elimination in MPI Reduction Operations , 2010, EuroMPI.
[8] George Bosilca,et al. Runtime level failure detection and propagation in HPC systems , 2019, EuroMPI.
[9] Gudula Rünger,et al. MPI Reduction Operations for Sparse Floating-point Data , 2008, PVM/MPI.
[10] Magnus Jahre,et al. Scalability analysis of AVX-512 extensions , 2019, The Journal of Supercomputing.
[11] Mateo Valero,et al. Using Arm’s scalable vector extension on stencil codes , 2019, The Journal of Supercomputing.
[12] Ryan E. Grant,et al. Fuzzy Matching: Hardware Accelerated MPI Communication Middleware , 2019, 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[13] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[14] Ali Sezgin,et al. Modelling the ARMv8 architecture, operationally: concurrency and ISA , 2016, POPL.
[15] Gilad Shainer,et al. Using InfiniBand Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All , 2016, EuroMPI.
[16] Yen-Chen Liu,et al. Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.
[17] Jack Dongarra,et al. ScaLAPACK user's guide , 1997 .
[18] Mateo Valero,et al. Stencil codes on a vector length agnostic architecture , 2018, PACT.
[19] Dhabaleswar K. Panda,et al. CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).
[20] Bashir M. Al-Hashimi,et al. Advanced SIMD: Extending the reach of contemporary SIMD architectures , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[21] David A. Padua,et al. An Evaluation of Vectorizing Compilers , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.