MAPS
暂无分享,去创建一个
Amnon Barak | Tal Ben-Nun | Ely Levy | Eri Rubin | A. Barak | Ely Levy | Tal Ben-Nun | Erik Rubin
[1] Bjarne Stroustrup,et al. The C++ Programming Language”, 3rd Edition, Pearson Education, 2007 , 2015 .
[2] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Pedro V. Sander,et al. Fast triangle reordering for vertex locality and reduced overdraw , 2007, SIGGRAPH 2007.
[4] Ade Miller,et al. C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++ , 2012 .
[5] John D. Owens,et al. Glift: Generic, efficient, random-access GPU data structures , 2006, TOGS.
[6] Bjarne Stroustrup,et al. C++ Programming Language , 1986, IEEE Softw..
[7] Xipeng Shen,et al. Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping , 2010, ICS '10.
[8] Keshav Pingali,et al. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm , 2011 .
[9] Brucek Khailany,et al. CudaDMA: Optimizing GPU memory bandwidth via warp specialization , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[10] Xavier Provot,et al. Deformation Constraints in a Mass-Spring Model to Describe Rigid Cloth Behavior , 1995 .
[11] Nancy M. Amato,et al. STAPL: An Adaptive, Generic Parallel C++ Library , 2001, LCPC.
[12] Tarek S. Abdelrahman,et al. hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.
[13] Timo Aila,et al. Understanding the efficiency of ray traversal on GPUs , 2009, High Performance Graphics.
[14] Kevin Skadron,et al. Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[15] Jeff A. Stuart,et al. A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).
[16] Jianbin Fang,et al. Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels , 2014, 2014 43rd International Conference on Parallel Processing.
[17] Hugues Hoppe,et al. Optimization of mesh locality for transparent vertex caching , 1999, SIGGRAPH.
[18] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[19] Nathan Bell,et al. Thrust: A Productivity-Oriented Library for CUDA , 2012 .
[20] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[21] Michael Garland,et al. Sparse matrix computations on manycore GPU’s , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[22] David R. Musser,et al. STL tutorial and reference guide, second edition: C++ programming with the standard template library , 2001 .
[23] Sandeep Koranne,et al. Boost C++ Libraries , 2011 .