Data-driven spatial locality
暂无分享,去创建一个
[1] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[2] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[3] J. Shewchuk,et al. Streaming computation of Delaunay triangulations , 2006, SIGGRAPH '06.
[4] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.
[5] Shankar Prasad Sastry,et al. Dynamic meshing techniques for quality improvement, untangling, and warping , 2012 .
[6] Brian J. N. Wylie,et al. Memory Profiling using Hardware Counters , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[7] Weng-Fai Wong,et al. Dynamic cache contention detection in multi-threaded applications , 2011, VEE '11.
[8] Jordi Petit,et al. Experiments on the minimum linear arrangement problem , 2003, ACM J. Exp. Algorithmics.
[9] Zhe Wang,et al. Ferret: a toolkit for content-based similarity search of feature-rich data , 2006, EuroSys.
[10] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[11] John M. Mellor-Crummey,et al. Pinpointing data locality problems using data-centric analysis , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[12] Vikram S. Adve,et al. Automatic pool allocation: improving performance by controlling data structure layout in the heap , 2005, PLDI '05.
[13] A. Azzouz. 2011 , 2020, City.
[14] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.
[15] Hiroshi Nakamura,et al. Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.
[16] Qin Zhao,et al. Umbra: efficient and scalable memory shadowing , 2010, CGO '10.
[17] Gerth Stølting Brodal,et al. Cache oblivious search trees via binary trees of small height , 2001, SODA '02.
[18] Vivien Quéma,et al. MemProf: A Memory Profiler for NUMA Multicore Systems , 2012, USENIX Annual Technical Conference.
[19] Jean-Loup Guillaume,et al. Fast unfolding of communities in large networks , 2008, 0803.0476.
[20] James R. Larus,et al. Cache-conscious structure layout , 1999, PLDI '99.
[21] Maurice Herlihy,et al. Concurrent Data Structures for Near-Memory Computing , 2017, SPAA.
[22] Gerth Stølting Brodal,et al. Cache-Oblivious Algorithms and Data Structures , 2004, SWAT.
[23] Bojan Mohar,et al. Optimal linear labelings and eigenvalues of graphs , 1992, Discret. Appl. Math..
[24] Martin Isenburg,et al. Streaming meshes , 2005, VIS 05. IEEE Visualization, 2005..
[25] James R. Larus,et al. Cache-conscious structure definition , 1999, PLDI '99.
[26] Dinesh Manocha,et al. Cache‐Efficient Layouts of Bounding Volume Hierarchies , 2006, Comput. Graph. Forum.
[27] Alexandra Fedorova,et al. DINAMITE: A modern approach to memory performance profiling , 2016, ArXiv.
[28] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[29] Robert Tappan Morris,et al. Locating cache performance bottlenecks using data profiling , 2010, EuroSys '10.
[30] Hao Luo,et al. HOTL: a higher order theory of locality , 2013, ASPLOS '13.
[31] Willy Zwaenepoel,et al. X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.
[32] D. Manocha,et al. Cache-oblivious mesh layouts , 2005, ACM Trans. Graph..
[33] Jason Evans April. A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .
[34] Emery D. Berger,et al. SHERIFF: precise detection and automatic mitigation of false sharing , 2011, OOPSLA '11.
[35] Benjamin G. Zorn,et al. BIT: A Tool for Instrumenting Java Bytecodes , 1997, USENIX Symposium on Internet Technologies and Systems.
[36] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[37] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[38] Margaret Martonosi,et al. MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.
[39] Tim Kraska,et al. The Case for Learned Index Structures , 2018 .
[40] Pedro V. Sander,et al. Fast triangle reordering for vertex locality and reduced overdraw , 2007, SIGGRAPH 2007.
[41] BodíkRastislav,et al. An efficient profile-analysis framework for data-layout optimizations , 2002 .
[42] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[43] James C. Browne,et al. Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[44] Ming Wu,et al. Managing Large Graphs on Multi-Cores with Graph Awareness , 2012, USENIX Annual Technical Conference.
[45] David J. DeWitt,et al. DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.
[46] Uri C. Weiser,et al. Semantic locality and context-based prefetching using reinforcement learning , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[47] Scott Shenker,et al. Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.
[48] Valerio Pascucci,et al. Simple and Efficient Mesh Layout with Space-Filling Curves , 2012, J. Graph. Tools.
[49] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[50] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.