Inner array inlining for structure of arrays layout
暂无分享,去创建一个
[1] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[2] Viera K. Proulx. Traffic simulation: a case study for teaching object oriented design , 1998, SIGCSE '98.
[3] Holger Homann,et al. SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes , 2017, Comput. Phys. Commun..
[4] Michael Schreckenberg,et al. A cellular automaton traffic flow model for online simulation of traffic , 2001, Parallel Comput..
[5] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[6] Kai Nagel,et al. Multi-agent traffic simulation with CUDA , 2009, 2009 International Conference on High Performance Computing & Simulation.
[7] Michael Schreckenberg,et al. A cellular automaton model for freeway traffic , 1992 .
[8] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[9] Nuno Faria,et al. Impact of Data Structure Layout on Performance , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[10] Thomas Fahringer,et al. Automatic Data Layout Optimizations for GPUs , 2015, Euro-Par.
[11] Mark J. Harris. CUDA: performance tips and tricks , 2007, SIGGRAPH '07.
[12] M. Schreckenberg,et al. Microscopic Simulation of Urban Traffic Based on Cellular Automata , 1997 .
[13] Andrew S. Grimshaw,et al. High-Performance and Scalable GPU Graph Traversal , 2015, ACM Trans. Parallel Comput..
[14] Robert Strzodka,et al. Abstraction for AoS and SoA layout in C , 2011 .
[15] Kai Nagel,et al. Using common graphics hardware for multi-agent traffic simulation with CUDA , 2009, SimuTools.
[16] Geoff Boeing,et al. OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks , 2016, Comput. Environ. Urban Syst..
[17] Bart De Moor,et al. Transportation Planning and Traffic Flow Models , 2005 .
[18] Andrew A. Chien,et al. An automatic object inlining optimization and its evaluation , 2000, PLDI '00.
[19] Michael Goesele,et al. Auto-Tuning Complex Array Layouts for GPUs , 2014, EGPGV@EuroVis.
[20] Hidehiko Masuhara,et al. Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout , 2018, WPMVP@PPoPP.
[21] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[22] Martin D. F. Wong,et al. An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.
[23] Jure Leskovec,et al. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..
[24] M. Pharr,et al. ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).
[25] Hanspeter Mössenböck,et al. Automatic array inlining in java virtual machines , 2008, CGO '08.
[26] Kai Nagel,et al. Using common graphics hardware for multi-agent traffic simulation with CUDA , 2009, SIMUTools 2009.
[27] Tarek S. Abdelrahman,et al. Launch-Time Optimization of OpenCL GPU Kernels , 2017, GPGPU@PPoPP.