论文信息 - A unifying abstraction for data structure splicing

A unifying abstraction for data structure splicing

Data structure splicing (DSS) refers to reorganizing data structures by merging or splitting them, reordering fields, inlining pointers, etc. DSS has been used, with demonstrated benefits, to improve spatial locality. When data fields that are accessed together are also collocated in the address space, the utilization of hardware caches improves and cache misses decline. A number of approaches to DSS have been proposed, but each addressed only one or two splicing optimizations (e.g., only splitting or only field reordering) and used an underlying abstraction that could not be extended to include others. Our work proposes a single abstraction, called Data Structure Access Graph (D-SAG), that (a) covers all data-splicing optimizations proposed previously and (b) unlocks new ones. Having a common abstraction has two benefits: (1) It enables us to build a single tool that hosts all DSS optimizations under one roof, eliminating the need to adopt multiple tools. (2) It avoids conflicts: e.g., where one tool suggests to split a data structure in a way that would conflict with another tool's suggestion to reorder fields. Based on the D-SAG abstraction, we build a toolchain that uses static and dynamic analysis to recommend DSS optimizations to developers. Using this tool, we identify ten benchmarks from the SPEC CPU2017 and PARSEC suites that are amenable to DSS, as well as a workload on RocksDB that stresses its memory table. Restructuring data structures following the tool's suggestion improves performance by an average of 11% (geomean) and reduces cache misses by an average of 28% (geomean) for seven of these workloads.

[1] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[2] Chen Ding,et al. Array regrouping and structure splitting using whole-program reference affinity , 2004, PLDI '04.

[3] Andrew A. Chien,et al. An automatic object inlining optimization and its evaluation , 2000, PLDI '00.

[4] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[5] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .

[6] Sandya Mannarswamy,et al. Practical structure layout optimization and advice , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[7] Pen-Chung Yew,et al. A compiler framework for general memory layout optimizations targeting structures , 2010, INTERACT-14.

[8] Jean-Loup Guillaume,et al. Fast unfolding of communities in large networks , 2008, 0803.0476.

[9] Yale N. Patt,et al. Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[10] Sandhya Dwarkadas,et al. Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[11] Alexandra Fedorova,et al. Data-driven spatial locality , 2018, MEMSYS.

[12] Rahman Lavaee,et al. The hardness of data packing , 2016, POPL.

[13] Michael J. Eager. Introduction to the DWARF Debugging Format , 2007 .

[14] José Nelson Amaral,et al. Forma: A framework for safe automatic array reshaping , 2007, ACM Trans. Program. Lang. Syst..

[15] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16] Kenneth B. Kent,et al. Using field access frequency to optimize layout of objects in the JVM , 2016, SAC.

[17] Alexandra Fedorova,et al. End-to-end memory behavior profiling with DINAMITE , 2016, SIGSOFT FSE.

[18] James R. Larus,et al. Cache-conscious structure definition , 1999, PLDI '99.

[19] M E J Newman,et al. Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.