暂无分享,去创建一个
[1] S. Alexander,et al. N-Body Simulations of Late Stage Planetary Formation with a Simple Fragmentation Model , 1998 .
[2] Benjamin C. Pierce,et al. Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..
[3] Jure Leskovec,et al. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..
[4] Atsushi Ohori,et al. An efficient non-moving garbage collector for functional languages , 2011, ICFP.
[5] J. Schank,et al. Biota: an object-oriented tool for modeling complex ecological systems , 1994 .
[6] Julian Cummings,et al. Comparison of C++ and Fortran 90 for object-oriented scientific programming , 1997 .
[7] Joshua M. Epstein,et al. Growing Artificial Societies: Social Science from the Bottom Up , 1996 .
[8] Dirk Grunwald,et al. Improving the cache locality of memory allocation , 1993, PLDI '93.
[9] Andreas Polze,et al. A Performance Evaluation of Dynamic Parallelism for Fine-Grained, Irregular Workloads , 2016, Int. J. Netw. Comput..
[10] Ken Friis Larsen,et al. Design and GPGPU performance of Futhark's redomap construct , 2016, ARRAY@PLDI.
[11] Lukas Stadler,et al. Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation , 2017, VEE.
[12] Marina Papatriantafilou,et al. Lock-free Concurrent Data Structures , 2013, ArXiv.
[13] Mark Harman,et al. An Empirical Investigation of the Influence of a Type of Side Effects on Program Comprehension , 2003, IEEE Trans. Software Eng..
[14] M. Schreckenberg,et al. Microscopic Simulation of Urban Traffic Based on Cellular Automata , 1997 .
[15] Tor M. Aamodt,et al. MIMD synchronization on SIMT architectures , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Mohamed Wahib,et al. Scalable Kernel Fusion for Memory-Bound GPU Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] Dragan A. Savić,et al. An investigation of the efficient implementation of cellular automata on multi-core CPU and GPU hardware , 2015, J. Parallel Distributed Comput..
[18] Michael Goesele,et al. MATOG: Array Layout Auto-Tuning for CUDA , 2017, TACO.
[19] Peng Tu,et al. Writing scalable SIMD programs with ISPC , 2014, WPMVP '14.
[20] Mingyu Chen,et al. Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning , 2017, PPoPP.
[21] Michael Philippsen,et al. Object Support for OpenMP-style Programming of GPU Clusters in Java , 2013, 2013 27th International Conference on Advanced Information Networking and Applications Workshops.
[22] Michael Goesele,et al. Auto-Tuning Complex Array Layouts for GPUs , 2014, EGPGV@EuroVis.
[23] Mitsuhisa Sato,et al. A Source-to-Source OpenACC Compiler for CUDA , 2013, Euro-Par Workshops.
[24] Trevor Alexander Brown,et al. Reclaiming Memory for Lock-Free Data Structures: There has to be a Better Way , 2015, PODC.
[25] Maged M. Michael. Scalable lock-free dynamic memory allocation , 2004, PLDI '04.
[26] Ulf Assarsson,et al. Efficient stream compaction on wide SIMD many-core architectures , 2009, High Performance Graphics.
[27] Yannis Manolopoulos,et al. Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes , 2003, ADBIS.
[28] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[29] Chuck Lever,et al. Malloc() Performance in a Multithreaded Linux Environment , 2000, USENIX Annual Technical Conference, FREENIX Track.
[30] John D. Owens,et al. A Dynamic Hash Table for the GPU , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[31] Jeongnim Kim,et al. Optimization and Parallelization of B-Spline Based Orbital Evaluations in QMC on Multi/Many-Core Shared Memory Processors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[32] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[33] Michael Schreckenberg,et al. A cellular automaton model for freeway traffic , 1992 .
[34] Xiaogang Ruan,et al. APPLICATIONS OF CELLULAR AUTOMATA IN COMPLEX SYSTEM STUDY , 2005 .
[35] Hanspeter Mössenböck,et al. Automatic array inlining in java virtual machines , 2008, CGO '08.
[36] Efficient Neighbor Searching for Agent-Based Simulation on GPU , 2014, 2014 IEEE/ACM 18th International Symposium on Distributed Simulation and Real Time Applications.
[37] Satish Narayanasamy,et al. Efficiently enforcing strong memory ordering in GPUs , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[38] Ganesh Gopalakrishnan,et al. GPU Concurrency: Weak Behaviours and Programming Assumptions , 2015, ASPLOS.
[39] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[40] Debra S Elston. A Primer for Agent-Based Simulation and Modeling in Transportation Applications , 2013 .
[41] J. M. Baveco,et al. Objects for Simulation: Smalltalk and Ecology* , 1994, Simul..
[42] Duane Merrill,et al. Single-pass Parallel Prefix Scan with Decoupled Lookback , 2016 .
[43] Michael Schreckenberg,et al. A cellular automaton traffic flow model for online simulation of traffic , 2001, Parallel Comput..
[44] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.
[45] Lieven Eeckhout,et al. Object-Relative Addressing: Compressed Pointers in 64-Bit Java Virtual Machines , 2007, ECOOP.
[46] Bernard Lang,et al. Incremental incrementally compacting garbage collection , 1987, PLDI.
[47] Martin D. F. Wong,et al. An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.
[48] Hideya Iwasaki,et al. A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming , 2009, APLAS.
[49] R. Sivanandan,et al. DEVELOPMENT OF MICROSCOPIC SIMULATION MODEL FOR HETEROGENEOUS TRAFFIC USING OBJECT ORIENTED APPROACH , 2008 .
[50] Christian Wimmer,et al. One VM to rule them all , 2013, Onward!.
[51] J. G. Ferreira,et al. ECOWIN — an object-oriented ecological model for aquatic ecosystems , 1995 .
[52] Nathaniel Nystrom,et al. Firepile: run-time compilation for GPUs in scala , 2011, GPCE '11.
[53] Gerhard Wellein,et al. Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips , 2014, WPMVP '14.
[54] James O. Coplien,et al. Curiously recurring template patterns , 1995 .
[55] Masao Kuwahara,et al. A development of a traffic simulator for urban road networks: AVENUE , 1994, Proceedings of VNIS'94 - 1994 Vehicle Navigation and Information Systems Conference.
[56] Rajkishore Barik,et al. Efficient Mapping of Irregular C++ Applications to Integrated GPUs , 2014, CGO '14.
[57] Henk Corporaal,et al. Fine-Grained Synchronizations and Dataflow Programming on GPUs , 2015, ICS.
[58] Mark Lee,et al. Vectorized production path tracing , 2017, High Performance Graphics.
[59] Sophia Drossopoulou,et al. You can have it all: abstraction and good cache performance , 2017, Onward!.
[60] Andrew S. Grimshaw,et al. High-Performance and Scalable GPU Graph Traversal , 2015, ACM Trans. Parallel Comput..
[61] Sudip K. Seal,et al. Efficient simulation of agent-based models on multi-GPU and multi-core clusters , 2010, SimuTools.
[62] Lionel Lacassagne,et al. Batched Cholesky factorization for tiny matrices , 2016, 2016 Conference on Design and Architectures for Signal and Image Processing (DASIP).
[63] Jeannette M. Wing,et al. A behavioral notion of subtyping , 1994, TOPL.
[64] Vincent B. C. Tan,et al. Adaptive floating node method for modelling cohesive fracture of composite materials , 2018 .
[65] Erez Petrank,et al. The Compressor: concurrent, incremental, and parallel compaction , 2006, PLDI '06.
[66] Stephen Jones,et al. XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.
[67] James C. King,et al. Symbolic execution and program testing , 1976, CACM.
[68] Michael Franz,et al. Accelerating Dynamically-Typed Languages on Heterogeneous Platforms Using Guards Optimization , 2018, ECOOP.
[69] Stanley B. Lippman. C++ gems , 1996 .
[70] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.
[71] D. Quinlan,et al. ROSE: Compiler Support for Object-Oriented Frameworks , 1999, Parallel Process. Lett..
[72] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[73] Roshan M. D'Souza,et al. A Framework for Megascale Agent Based Model Simulations on Graphics Processing Units , 2008, J. Artif. Soc. Soc. Simul..
[74] John D. Owens,et al. Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.
[75] Kei Davis,et al. Parallel Object-Oriented Scientific Computing Today , 2003, ECOOP Workshops.
[76] Martín Abadi,et al. Dynamic typing in a statically-typed language , 1989, POPL '89.
[77] M. Snir,et al. Ghost Cell Pattern , 2010, ParaPLoP '10.
[78] Xinxin Mei,et al. Benchmarking the Memory Hierarchy of Modern GPUs , 2014, NPC.
[79] David R. Kaeli,et al. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.
[80] Joshua M. Epstein,et al. Growing Artificial Societies: Social Science from the Bottom Up , 1996 .
[81] Robert Strzodka,et al. Abstraction for AoS and SoA layout in C , 2011 .
[82] Firas Hamze,et al. A Performance Comparison of CUDA and OpenCL , 2010, ArXiv.
[83] Radek Stibora. Building of SBVH on Graphical Hardware , 2016 .
[84] Vernon Rego,et al. Efficient Algorithms for Stream Compaction on GPUs , 2017, Int. J. Netw. Comput..
[85] 簡聰富,et al. 物件導向軟體之架構(Object-Oriented Software Construction)探討 , 1989 .
[86] Vlastimil Havran,et al. Register Efficient Dynamic Memory Allocator for GPUs , 2015, Comput. Graph. Forum.
[87] Shigeru Chiba,et al. A metaobject protocol for C++ , 1995, OOPSLA.
[88] Stephen John Turner,et al. Supporting efficient execution of continuous space agent‐based simulation on GPU , 2016, Concurr. Comput. Pract. Exp..
[89] Glenn Krasner,et al. Smalltalk-80: bits of history, words of advice , 1983 .
[90] Simon D. Hammond,et al. Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[91] Mark Moir,et al. SNZI: scalable NonZero indicators , 2007, PODC '07.
[92] Fan Yao,et al. XBFS: eXploring Runtime Optimizations for Breadth-First Search on GPUs , 2019, HPDC.
[93] R. D'Souza. SUGARSCAPE ON STEROIDS : SIMULATING OVER A MILLION AGENTS , 2007 .
[94] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.
[95] Stephen John Turner,et al. Cloning Agent-based Simulation on GPU , 2015, SIGSIM-PADS.
[96] Daniel H. H. Ingalls. A Simple Technique for Handling Multiple Polymorphism , 1986, OOPSLA.
[97] Maged M. Michael. Safe memory reclamation for dynamic lock-free objects using atomic reads and writes , 2002, PODC '02.
[98] William R. Cook,et al. Mixin-based inheritance , 1990, OOPSLA/ECOOP '90.
[99] M. Steinberger,et al. ScatterAlloc: Massively parallel dynamic memory allocation for the GPU , 2012, 2012 Innovative Parallel Computing (InPar).
[100] Kathryn S. McKinley,et al. Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.
[101] No License,et al. Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .
[102] Holger Homann,et al. SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes , 2017, Comput. Phys. Commun..
[103] Robert Hirschfeld,et al. Columnar objects: improving the performance of analytical applications , 2015, Onward!.
[104] Michael Goesele,et al. Fast dynamic memory allocator for massively parallel architectures , 2013, GPGPU@ASPLOS.
[105] Carlchristian Eckert,et al. Enhancements of the massively parallel memory allocator ScatterAlloc and its adaption to the general interface mallocMC , 2014 .
[106] Viera K. Proulx. Traffic simulation: a case study for teaching object oriented design , 1998, SIGCSE '98.
[107] Naoya Maruyama,et al. Optimizing Stencil Computations for NVIDIA Kepler GPUs , 2014 .
[108] Rj Allan,et al. Survey of Agent Based Modelling and Simulation Tools , 2009 .
[109] Jianbin Fang,et al. A Comprehensive Performance Comparison of CUDA and OpenCL , 2011, 2011 International Conference on Parallel Processing.
[110] Paul D. Gilbert. Creating Stand-Alone Smalltalk Applications , 1988 .
[111] Peter Wegner,et al. Concepts and paradigms of object-oriented programming , 1990, OOPS.
[112] Alastair F. Donaldson,et al. Exposing errors related to weak memory in GPU applications , 2016, PLDI.
[113] Trevor Brown,et al. Techniques for Constructing Efficient Lock-free Data Structures , 2017, ArXiv.
[114] Fatos Xhafa,et al. Programming multi-core and many-core computing systems , 2014 .
[115] Dirk Riehle,et al. Value object , 2006, PLoP '06.
[116] Benjamin Keinert,et al. Real-time local displacement using dynamic GPU memory management , 2013, HPG '13.
[117] William Silvert,et al. Object-oriented ecosystem modelling , 1993 .
[118] Michel Steuwer,et al. A Composable Array Function Interface for Heterogeneous Computing in Java , 2014, ARRAY@PLDI.
[119] Thomas Fahringer,et al. Automatic Data Layout Optimizations for GPUs , 2015, Euro-Par.
[120] Ludek Matyska,et al. Optimizing CUDA code by kernel fusion: application on BLAS , 2013, The Journal of Supercomputing.
[121] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[122] Maged M. Michael. Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.
[123] Vivek Sarkar,et al. Compiling and Optimizing Java 8 Programs for GPU Execution , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[124] James R. Larus,et al. Cache-conscious structure definition , 1999, PLDI '99.
[125] Jingyue Wu,et al. gpucc: An open-source GPGPU compiler , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[126] Urs Hölzle,et al. Eliminating Virtual Function Calls in C++ Programs , 1996, ECOOP.
[127] Erez Petrank,et al. An efficient parallel heap compaction algorithm , 2004, OOPSLA.
[128] Kunle Olukotun,et al. Building-Blocks for Performance Oriented DSLs , 2011, DSL.
[129] M. Pharr,et al. ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).
[130] Chen Ding,et al. Array regrouping and structure splitting using whole-program reference affinity , 2004, PLDI '04.
[131] Sudhakar Yalamanchili,et al. Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[132] Bart De Moor,et al. Transportation Planning and Traffic Flow Models , 2005 .
[133] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .
[134] Ryan Newton,et al. Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host , 2014, OOPSLA.
[135] Andrew A. Chien,et al. An automatic object inlining optimization and its evaluation , 2000, PLDI '00.
[136] Marco Maggioni,et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.
[137] Vivek Sarkar,et al. Compiler-Driven Data Layout Transformation for Heterogeneous Platforms , 2013, Euro-Par Workshops.
[138] Sang-Hee Lee,et al. Effects of wind and tree density on forest fire patterns in a mixed-tree species forest , 2017 .
[139] Sebastian Hack,et al. Sierra: a SIMD extension for C++ , 2014, WPMVP '14.
[140] Keshav Pingali,et al. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm , 2011 .
[141] Iisakki Kosonen. HUTSIM: SIMULATION TOOL FOR TRAFFIC SIGNAL CONTROL PLANNING , 1996 .
[142] Paul R. Wilson,et al. The memory fragmentation problem: solved? , 1998, ISMM '98.
[143] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.
[144] M Mernik,et al. When and how to develop domain-specific languages , 2005, CSUR.
[145] Matthias Felleisen. Functional Objects , 2004, ECOOP.
[146] James Abel,et al. Applications Tuning for Streaming SIMD Extensions , 1999 .
[147] Yoav Ossia,et al. Mostly concurrent compaction for mark-sweep GC , 2004, ISMM '04.
[148] Elliott W. Montroll,et al. Nonlinear Population Dynamics. (Book Reviews: On the Volterra and Other Nonlinear Models of Interacting Populations) , 1971 .
[149] John D. Owens,et al. A Work-Efficient Step-Efficient Prefix Sum Algorithm , 2006 .
[150] Debasis Das. A Survey on Cellular Automata and Its Applications , 2011 .
[151] Sophia Drossopoulou,et al. Extending SHAPES for SIMD Architectures: An approach to native support for Struct of Arrays in languages , 2018, ICOOOLPS@ECOOP.
[152] Kenli Li,et al. Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[153] Ulrich Rüde,et al. Expression Templates Revisited: A Performance Analysis of Current Methodologies , 2011, SIAM J. Sci. Comput..
[154] David F. Bacon,et al. Compiling a high-level language for GPUs: (via language support for architectures and compilers) , 2012, PLDI.
[155] Ralph Johnson,et al. design patterns elements of reusable object oriented software , 2019 .
[156] Joseph Kehoe. The Specification of Sugarscape , 2015, ArXiv.
[157] Michael Philippsen,et al. Parallel memory defragmentation on a GPU , 2012, MSPC '12.
[158] Paul W. Rendell,et al. Game of Life Universal Turing Machine , 2016 .
[159] Ching-Lung Su,et al. Overview and comparison of OpenCL and CUDA technology for GPGPU , 2012, 2012 IEEE Asia Pacific Conference on Circuits and Systems.
[160] Marc Snir,et al. Transformation for class immutability , 2011, 2011 33rd International Conference on Software Engineering (ICSE).
[161] Xiaoming Li,et al. CUDA Memory Optimizations for Large Data-Structures in the Gravit Simulator , 2009, 2009 International Conference on Parallel Processing Workshops.
[162] Vasily Volkov,et al. Understanding Latency Hiding on GPUs , 2016 .
[163] Bjarne Stroustrup. Foundations of C++ , 2012, ESOP.
[164] Stefan Hanenberg,et al. How do API documentation and static typing affect API usability? , 2014, ICSE.
[165] Ana Lucia Varbanescu,et al. KMA: A Dynamic Memory Manager for OpenCL , 2014, GPGPU@ASPLOS.
[166] Laxmi N. Bhuyan,et al. Efficient warp execution in presence of divergence with collaborative context collection , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[167] Keir Fraser,et al. Practical lock-freedom , 2003 .
[168] Amos O. Olagunju,et al. The Benefits of Object-oriented Methodology for Software Development , 2015 .
[169] Timothy G. Rogers,et al. Characterizing the Runtime Effects of Object-Oriented Workloads on GPUs , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[170] Massimiliano Fatica,et al. Implementing the Himeno benchmark with CUDA on GPU clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[171] Ingo Wald,et al. Extending a C-like language for portable SIMD programming , 2012, PPoPP '12.
[172] Robert Strzodka. Data layout optimization for multi-valued containers in OpenCL , 2012, J. Parallel Distributed Comput..
[173] Michael Garland,et al. Throughput-oriented GPU memory allocation , 2019, PPoPP.
[174] Jeff Bonwick,et al. The Slab Allocator: An Object-Caching Kernel Memory Allocator , 1994, USENIX Summer.
[175] Stefania Bandini,et al. Agent Based Modeling and Simulation: An Informatics Perspective , 2009, J. Artif. Soc. Soc. Simul..