Reasoning About Foreign Function Interfaces Without Modelling the Foreign Language
暂无分享,去创建一个
[1] Robert Strzodka,et al. Abstraction for AoS and SoA layout in C , 2011 .
[2] Maged M. Michael. Scalable lock-free dynamic memory allocation , 2004, PLDI '04.
[3] Michael Garland,et al. Throughput-oriented GPU memory allocation , 2019, PPoPP.
[4] Radek Stibora. Building of SBVH on Graphical Hardware , 2016 .
[5] Vernon Rego,et al. Efficient Algorithms for Stream Compaction on GPUs , 2017, Int. J. Netw. Comput..
[6] Keshav Pingali,et al. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm , 2011 .
[7] Stefania Bandini,et al. Agent Based Modeling and Simulation: An Informatics Perspective , 2009, J. Artif. Soc. Soc. Simul..
[8] Vlastimil Havran,et al. Register Efficient Dynamic Memory Allocator for GPUs , 2015, Comput. Graph. Forum.
[9] Ulf Assarsson,et al. Efficient stream compaction on wide SIMD many-core architectures , 2009, High Performance Graphics.
[10] Yannis Manolopoulos,et al. Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes , 2003, ADBIS.
[11] James Abel,et al. Applications Tuning for Streaming SIMD Extensions , 1999 .
[12] John D. Owens,et al. A Dynamic Hash Table for the GPU , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[13] Thomas Fahringer,et al. Automatic Data Layout Optimizations for GPUs , 2015, Euro-Par.
[14] Kei Davis,et al. Parallel Object-Oriented Scientific Computing Today , 2003, ECOOP Workshops.
[15] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[16] M. Steinberger,et al. ScatterAlloc: Massively parallel dynamic memory allocation for the GPU , 2012, 2012 Innovative Parallel Computing (InPar).
[17] John D. Owens,et al. A Work-Efficient Step-Efficient Prefix Sum Algorithm , 2006 .
[18] Sophia Drossopoulou,et al. Extending SHAPES for SIMD Architectures: An approach to native support for Struct of Arrays in languages , 2018, ICOOOLPS@ECOOP.
[19] Kenli Li,et al. Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[20] Michael Goesele,et al. MATOG: Array Layout Auto-Tuning for CUDA , 2017, TACO.
[21] Trevor Alexander Brown,et al. Reclaiming Memory for Lock-Free Data Structures: There has to be a Better Way , 2015, PODC.
[22] Michael Goesele,et al. Fast dynamic memory allocator for massively parallel architectures , 2013, GPGPU@ASPLOS.
[23] Carlchristian Eckert,et al. Enhancements of the massively parallel memory allocator ScatterAlloc and its adaption to the general interface mallocMC , 2014 .
[24] Rj Allan,et al. Survey of Agent Based Modelling and Simulation Tools , 2009 .
[25] Stephen John Turner,et al. Supporting efficient execution of continuous space agent‐based simulation on GPU , 2016, Concurr. Comput. Pract. Exp..
[26] Simon D. Hammond,et al. Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[27] Mark Moir,et al. SNZI: scalable NonZero indicators , 2007, PODC '07.
[28] Benjamin Keinert,et al. Real-time local displacement using dynamic GPU memory management , 2013, HPG '13.
[29] Efficient Neighbor Searching for Agent-Based Simulation on GPU , 2014, 2014 IEEE/ACM 18th International Symposium on Distributed Simulation and Real Time Applications.
[30] Sophia Drossopoulou,et al. You can have it all: abstraction and good cache performance , 2017, Onward!.
[31] CaiWentong,et al. Supporting efficient execution of continuous space agent-based simulation on GPU , 2016 .
[32] William N. Scherer,et al. Nonblocking Concurrent Data Structures with Condition Synchronization , 2004, DISC.
[33] James R. Larus,et al. Cache-conscious structure definition , 1999, PLDI '99.
[34] M. Pharr,et al. ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).
[35] Ganesh Gopalakrishnan,et al. GPU Concurrency: Weak Behaviours and Programming Assumptions , 2015, ASPLOS.
[36] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[37] Dirk Grunwald,et al. Improving the cache locality of memory allocation , 1993, PLDI '93.
[38] Andreas Polze,et al. A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads , 2016, Int. J. Netw. Comput..
[39] Marina Papatriantafilou,et al. Lock-free Concurrent Data Structures , 2013, ArXiv.
[40] Emery D. Berger,et al. A locality-improving dynamic memory allocator , 2005, MSP '05.
[41] Vasily Volkov,et al. Understanding Latency Hiding on GPUs , 2016 .
[42] Lionel Lacassagne,et al. Batched Cholesky factorization for tiny matrices , 2016, 2016 Conference on Design and Architectures for Signal and Image Processing (DASIP).
[43] Vincent B. C. Tan,et al. Adaptive floating node method for modelling cohesive fracture of composite materials , 2018 .
[44] Dietmar Gallistl. The adaptive finite element method , 2016 .
[45] David R. Kaeli,et al. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.
[46] Joshua M. Epstein,et al. Growing Artificial Societies: Social Science from the Bottom Up , 1996 .
[47] Chuck Lever,et al. Malloc() Performance in a Multithreaded Linux Environment , 2000, USENIX Annual Technical Conference, FREENIX Track.
[48] Michael Schreckenberg,et al. A cellular automaton model for freeway traffic , 1992 .
[49] Stephen Jones,et al. XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.
[50] Stephen John Turner,et al. Cloning Agent-based Simulation on GPU , 2015, SIGSIM-PADS.
[51] Maged M. Michael. Safe memory reclamation for dynamic lock-free objects using atomic reads and writes , 2002, PODC '02.
[52] Hidehiko Masuhara,et al. Ikra-Cpp: A C++/CUDA DSL for Object-Oriented Programming with Structure-of-Arrays Layout , 2018, WPMVP@PPoPP.
[53] Graham C. Archer,et al. Object-Oriented Finite Element Analysis , 1996 .
[54] S. Alexander,et al. N-Body Simulations of Late Stage Planetary Formation with a Simple Fragmentation Model , 1998 .
[55] Atsushi Ohori,et al. An efficient non-moving garbage collector for functional languages , 2011, ICFP.
[56] Julian Cummings,et al. Comparison of C++ and Fortran 90 for object-oriented scientific programming , 1997 .
[57] Sang-Hee Lee,et al. Effects of wind and tree density on forest fire patterns in a mixed-tree species forest , 2017 .
[58] Kathryn S. McKinley,et al. Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.
[59] Holger Homann,et al. SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes , 2017, Comput. Phys. Commun..
[60] Robert Hirschfeld,et al. Columnar objects: improving the performance of analytical applications , 2015, Onward!.
[61] Ana Lucia Varbanescu,et al. KMA: A Dynamic Memory Manager for OpenCL , 2014, GPGPU@ASPLOS.