Cimple: instruction and memory level parallelism: a DSL for uncovering ILP and MLP
暂无分享,去创建一个
Martin C. Rinard | Haoran Xu | Saman P. Amarasinghe | Vladimir Kiriansky | Vladimir Kiriansky | M. Rinard | Haoran Xu
[1] Edward T. Grochowski,et al. Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[2] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[3] Christian Queinnec,et al. A dynamic extent control operator for partial continuations , 1991, POPL '91.
[4] Craig Freedman,et al. Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.
[5] Eric Rotenberg,et al. Control-Flow Decoupling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[6] David Gregg,et al. Optimizing indirect branch prediction accuracy in virtual machine interpreters , 2003, PLDI '03.
[7] Hui Ding,et al. TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.
[8] C. Martin. 2015 , 2015, Les 25 ans de l’OMC: Une rétrospective en photos.
[9] Donald Yeung,et al. Multi-chain prefetching: effective exploitation of inter-chain memory parallelism for pointer-chasing codes , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[10] Kenneth A. Ross,et al. Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.
[11] Norman May,et al. Interleaving with Coroutines: A Practical Approach for Robust Index Joins , 2017, Proc. VLDB Endow..
[12] Jeff Chamberlain,et al. Ivy Bridge Server: A Converged Design , 2015, IEEE Micro.
[13] Easwaran Raman,et al. Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[14] Jim Hunter,et al. Exploiting Coroutines to Attack the "Killer Nanoseconds" , 2018, Proc. VLDB Endow..
[15] André Seznec,et al. Branch prediction and the performance of interpreters — Don't trust folklore , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[16] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[17] Pradeep Dubey,et al. Architecting to achieve a billion requests per second throughput on a single key-value store server platform , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[18] Allan Porterfield,et al. The Tera computer system , 1990, ICS '90.
[19] Minxuan Zhang,et al. Advanced Computer Architecture , 2016, Communications in Computer and Information Science.
[20] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[21] Stefanos Kaxiras,et al. Clairvoyance: Look-ahead compile-time scheduling , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[22] Herb Sutter,et al. Task Region | N3832 , 2014 .
[23] David A. Patterson,et al. Attack of the killer microseconds , 2017, Commun. ACM.
[24] Easwaran Raman,et al. Parallel-stage decoupled software pipelining , 2008, CGO '08.
[25] Michael Stonebraker,et al. E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing , 2014, Proc. VLDB Endow..
[26] James R. Larus,et al. Cache-conscious structure layout , 1999, PLDI '99.
[27] Gustavo Alonso,et al. Main-Memory Hash Joins on Modern Processor Architectures , 2015, IEEE Transactions on Knowledge and Data Engineering.
[28] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[29] Anastasia Ailamaki,et al. Improving hash join performance through prefetching , 2004, Proceedings. 20th International Conference on Data Engineering.
[30] A. Azzouz. 2011 , 2020, City.
[31] Todd C. Mowry,et al. Improving index performance through prefetching , 2001, SIGMOD '01.
[32] Pradeep Dubey,et al. PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors , 2011, Proc. VLDB Endow..
[33] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[34] Josep Torrellas,et al. Scalable Cache Miss Handling for High Memory-Level Parallelism , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[35] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[36] Pat Morin,et al. Array Layouts for Comparison-Based Searching , 2015, ACM J. Exp. Algorithmics.
[37] Viktor Leis,et al. The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[38] Richard W. Vuduc,et al. When Prefetching Works, When It Doesn’t, and Why , 2012, TACO.
[39] Junjie Wu,et al. Advanced Computer Architecture , 2014, Communications in Computer and Information Science.
[40] Ulf Leser,et al. Cache-Sensitive Skip List: Efficient Range Queries on Modern CPUs , 2016, ADMS/IMDM@VLDB.
[41] Robert Hieb,et al. Representing control in the presence of first-class continuations , 1990, PLDI '90.
[42] Martin Grund,et al. Impala: A Modern, Open-Source SQL Engine for Hadoop , 2015, CIDR.
[43] David A. Patterson,et al. Reducing Pagerank Communication via Propagation Blocking , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[44] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[45] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[46] Martin C. Rinard,et al. Cimple: Instruction and Memory Level Parallelism , 2018, ArXiv.
[47] Viktor Leis,et al. Processing in the Hybrid OLTP & OLAP Main-Memory Database System HyPer , 2013, IEEE Data Eng. Bull..
[48] Pradeep Dubey,et al. Fast Updates on Read-Optimized Databases Using Multi-Core CPUs , 2011, Proc. VLDB Endow..
[49] Dan S. Wallach,et al. Denial of Service via Algorithmic Complexity Attacks , 2003, USENIX Security Symposium.
[50] Babak Falsafi,et al. Asynchronous Memory Access Chaining , 2015, Proc. VLDB Endow..
[51] Timothy G. Armstrong,et al. LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.
[52] Weiyun Huang,et al. Real-Time Analytical Processing with SQL Server , 2015, Proc. VLDB Endow..
[53] Todd C. Mowry,et al. Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last , 2017, Proc. VLDB Endow..
[54] Xin Chen,et al. F1: the fault-tolerant distributed RDBMS supporting google's ad business , 2012, SIGMOD Conference.
[55] Yunming Zhang,et al. Optimizing indirect memory references with milk , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).
[56] R. Kent Dybvig,et al. Representing control in the presence of one-shot continuations , 1996, PLDI '96.
[57] Thomas Neumann,et al. Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..
[58] Allen Newell,et al. An introduction to information processing language V , 1960, Commun. ACM.
[59] Balaram Sinharoy,et al. IBM POWER7 multicore server processor , 2011 .
[60] Sudipta Sengupta,et al. Indexing on modern hardware: hekaton and beyond , 2014, SIGMOD Conference.
[61] Allen Newell,et al. The logic theory machine-A complex information processing system , 1956, IRE Trans. Inf. Theory.
[62] David A. Patterson,et al. Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server , 2015, 2015 IEEE International Symposium on Workload Characterization.
[63] Margaret Martonosi,et al. Informing Memory Operations: Providing Memory Performance Feedback in Modern Processors , 1996, ISCA.
[64] André Seznec,et al. Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).