Characterizing Active Data Sharing in Threaded Applications Using Shared Footprint

Footprint is one of the most intuitive metrics in measuring locality and the memory behavior of programs. Much of the footprint research has been done on workloads consisting of independent sequential applications. For threaded applications, however, the traditional metric of footprint is inadequate since it does not directly measure data sharing. This paper proposes two new metrics called shared footprint and sharing ratio to capture the amount of active data sharing in a threaded execution. It also presents an empirical characterization of the data sharing behaviors using these metrics on the PARSEC Benchmark Suite. Based on the initial results, this paper discusses possible uses of the new metrics in program tuning and optimization for threaded applications on multicore processors.

[1]  Peter J. Denning,et al.  Properties of the working-set model , 1972, CACM.

[2]  Michael L. Scott,et al.  False sharing and its effect on shared memory performance , 1993 .

[3]  Laxmi N. Bhuyan,et al.  No More Backstabbing... A Faithful Scheduling Policy for Multithreaded Programs , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[4]  Donald Yeung,et al.  Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis , 2012, MSPC '12.

[5]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[6]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[7]  Chen Ding,et al.  Linear-time Modeling of Program Working Set in Shared Cache , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[8]  Donald Yeung,et al.  Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[9]  Milind Kulkarni,et al.  Accelerating multicore reuse distance analysis with sampling and parallelization , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Dean M. Tullsen,et al.  Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture , 2008, HiPEAC.

[11]  Xipeng Shen,et al.  The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications , 2012, IEEE Transactions on Parallel and Distributed Systems.

[12]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13]  Derek L. Schuff,et al.  Multicore-aware reuse distance analysis , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[14]  Aamer Jaleel,et al.  Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[15]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[16]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.