RegTT: Accelerating Tree Traversals on GPUs by Exploiting Regularities
暂无分享,去创建一个
[1] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[2] Peter N. Yianilos,et al. Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.
[3] Milind Kulkarni,et al. Automatically enhancing locality for tree traversals with traversal splicing , 2012, OOPSLA '12.
[4] E. Mansson,et al. Deep Coherent Ray Tracing , 2007, 2007 IEEE Symposium on Interactive Ray Tracing.
[5] Kun Zhou,et al. Real-time KD-tree construction on graphics hardware , 2008, SIGGRAPH 2008.
[6] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.
[7] Keshav Pingali,et al. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm , 2011 .
[8] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.
[9] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[10] Tim Foley,et al. KD-tree acceleration structures for a GPU raytracer , 2005, HWWS '05.
[11] Michael Goldfarb,et al. General transformations for GPU execution of tree traversals , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[12] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[13] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Hans-Peter Seidel,et al. Stackless KD‐Tree Traversal for High Performance GPU Ray Tracing , 2007, Comput. Graph. Forum.
[15] Xiangke Liao,et al. An Efficient GPU Implementation of Inclusion-Based Pointer Analysis , 2016, IEEE Transactions on Parallel and Distributed Systems.
[16] Jingling Xue,et al. Automatic Parallelization of Tiled Loop Nests with Enhanced Fine-Grained Parallelism on GPUs , 2012, 2012 41st International Conference on Parallel Processing.
[17] Milind Kulkarni,et al. Enhancing locality for recursive traversals of recursive structures , 2011, OOPSLA '11.
[18] Hui Wu,et al. Parallelizing SOR for GPGPUs using alternate loop tiling , 2012, Parallel Comput..
[19] Michael Goldfarb,et al. Automatic vectorization of tree traversals , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[20] James R. Larus,et al. SIMD parallelization of applications that traverse irregular data structures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[21] Jingling Xue,et al. Model-Driven Tile Size Selection for DOACROSS Loops on GPUs , 2011, Euro-Par.
[22] Yang Yang,et al. A Highly Parallel Reuse Distance Analysis Algorithm on GPUs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[23] Sriram Krishnamoorthy,et al. Efficient execution of recursive programs on commodity vector hardware , 2015, PLDI.
[24] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25] Z. Meral Özsoyoglu,et al. Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.