Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA
暂无分享,去创建一个
Hong Chen | Feng Zhang | Cuiping Li | Chengxin Guo | Hong Chen | Cuiping Li | Feng Zhang | Chengxin Guo
[1] Gustavo Alonso,et al. Distributed Join Algorithms on Thousands of Cores , 2017, Proc. VLDB Endow..
[2] Dhabaleswar K. Panda,et al. Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).
[3] Kenneth A. Ross,et al. Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.
[4] Daniel Peter Playne,et al. A New Algorithm for Parallel Connected-Component Labelling on GPUs , 2018, IEEE Transactions on Parallel and Distributed Systems.
[5] Weifeng Liu,et al. Fast segmented sort on GPUs , 2017, ICS.
[6] Chonggang Wang,et al. GPU-Accelerated High-Throughput Online Stream Data Processing , 2018, IEEE Transactions on Big Data.
[7] Tong Liu,et al. The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications , 2011, Computer Science - Research and Development.
[8] Wu-chun Feng,et al. Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[9] Henri Casanova,et al. Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs , 2018, ICS.
[10] Sadaf R. Alam,et al. Evaluation of Inter- and Intra-node Data Transfer Efficiencies between GPU Devices and their Impact on Scalable Applications , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[11] Yi Lu,et al. AdaptDB: Adaptive Partitioning for Distributed Joins , 2017, Proc. VLDB Endow..
[12] Jack J. Dongarra,et al. Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[13] Shinpei Kato,et al. Relational Joins on GPUs: A Closer Look , 2017, IEEE Transactions on Parallel and Distributed Systems.
[14] Gustavo Alonso,et al. Rack-Scale In-Memory Join Processing using RDMA , 2015, SIGMOD Conference.
[15] Dhabaleswar K. Panda,et al. Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[16] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.
[17] Dhabaleswar K. Panda,et al. Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures , 2018, EuroMPI.
[18] Pavan Balaji,et al. MT-MPI: multithreaded MPI for many-core environments , 2014, ICS '14.
[19] Sreeram Potluri,et al. GPUDirect Async: Exploring GPU synchronous communication techniques for InfiniBand clusters , 2018, J. Parallel Distributed Comput..
[20] Andrea Clematis,et al. An MPI-CUDA library for image processing on HPC architectures , 2015, J. Comput. Appl. Math..
[21] Dhabaleswar K. Panda,et al. GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation , 2014, IEEE Transactions on Parallel and Distributed Systems.
[22] John M. Levesque,et al. An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication , 2016, Int. J. High Perform. Comput. Appl..
[23] Johannes Langguth,et al. GPU-based Acceleration of Detailed Tissue-Scale Cardiac Simulations , 2018, GPGPU@PPoPP.
[24] Awais Ahmad,et al. Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem , 2018, International Journal of Parallel Programming.
[25] Dhabaleswar K. Panda,et al. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.
[26] Federico Silla,et al. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.
[27] Bingsheng He,et al. Relational query coprocessing on graphics processors , 2009, TODS.
[28] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.
[29] Gustavo Alonso,et al. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[30] Yuan Yuan,et al. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices , 2013, Proc. VLDB Endow..
[31] Hari Sundar,et al. Utilizing GPU Parallelism to Improve Fast Spherical Harmonic Transforms , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).
[32] Xu Liu,et al. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect , 2019, IEEE Transactions on Parallel and Distributed Systems.
[33] Xiaoyong Du,et al. An adaptive breadth-first search algorithm on integrated architectures , 2018, The Journal of Supercomputing.
[34] Wu-chun Feng,et al. MPI-ACC: Accelerator-Aware MPI for Scientific Applications , 2016, IEEE Transactions on Parallel and Distributed Systems.
[35] Hao Wang,et al. SEP-graph: finding shortest execution paths for graph processing under a hybrid framework on GPU , 2019, PPoPP.
[36] Odysseas I. Pentakalos. An Introduction to the InfiniBand Architecture , 2002, Int. CMG Conference.
[37] Roberto Palmieri,et al. Understanding RDMA Behavior in NUMA Systems , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[38] Anastasia Ailamaki,et al. Hardware-conscious Query Processing in GPU-accelerated Analytical Engines , 2019, CIDR.
[39] Dhabaleswar K. Panda,et al. Designing MPI Library with On-Demand Paging (ODP) of InfiniBand: Challenges and Benefits , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[40] Wenguang Chen,et al. Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures , 2017, IEEE Transactions on Parallel and Distributed Systems.
[41] Dhabaleswar K. Panda,et al. HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters , 2014, 2014 43rd International Conference on Parallel Processing.
[42] Dhabaleswar K. Panda,et al. Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast , 2019, IEEE Transactions on Parallel and Distributed Systems.
[43] Bingsheng He,et al. Relational joins on graphics processors , 2008, SIGMOD Conference.
[44] Yi-Cheng Tu,et al. Fast Equi-Join Algorithms on GPUs: Design and Implementation , 2017, SSDBM.
[45] Hao Li,et al. Join algorithms on GPUs: A revisit after seven years , 2015, 2015 IEEE International Conference on Big Data (Big Data).
[46] Hao Wang,et al. Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[47] David A. Bader,et al. GPU merge path: a GPU merging algorithm , 2012, ICS '12.