Using Differential Execution Analysis to Identify Thread Interference
暂无分享,去创建一个
Amina Guermouche | Elisabeth Brunet | Mohamed Said Mosli Bouksiaa | Francois Trahay | Alexis Lescouet | Gauthier Voron | Gaël Thomas | Remi Dulong
[1] B. Jacob,et al. CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .
[2] Thomas F. Wenisch,et al. Statistical Analysis of Latency Through Semantic Profiling , 2017, EuroSys.
[3] Erik R. Altman,et al. Performance analysis of idle programs , 2010, OOPSLA.
[4] Greg Bronevetsky,et al. Active Measurement of the Impact of Network Switch Utilization on Application Performance , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[5] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[6] Julia L. Lawall,et al. Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications , 2012, USENIX Annual Technical Conference.
[7] Alexandra Fedorova,et al. Deconstructing the overhead in parallel applications , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[8] Mary Lou Soffa,et al. Contention aware execution: online contention detection and response , 2010, CGO '10.
[9] Dutch T. Meyer,et al. Whose cache line is it anyway?: operating system support for live detection and repair of false sharing , 2013, EuroSys '13.
[10] Katherine E. Isaacs,et al. There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[11] Bo Wu,et al. ScaAnalyzer: a tool to identify memory scalability bottlenecks in parallel programs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Emery D. Berger,et al. SHERIFF: precise detection and automatic mitigation of false sharing , 2011, OOPSLA '11.
[13] Manuel Selva,et al. NumaMMA: NUMA MeMory Analyzer , 2018, ICPP.
[14] Nathan R. Tallent,et al. Analyzing lock contention in multithreaded applications , 2010, PPoPP '10.
[15] Wolfgang Karl,et al. CacheIn: A Toolset for Comprehensive Cache Inspection , 2005, International Conference on Computational Science.
[16] Min Zhou,et al. Experiences and lessons learned with a portable interface to hardware performance counters , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[17] Nikolai Joukov,et al. Operating system profiling via latency analysis , 2006, OSDI '06.
[18] Daniel Hagimont,et al. Application-specific quantum for multi-core platform scheduler , 2016, EuroSys.
[19] Nathan Froyd,et al. Scalability analysis of SPMD codes using expectations , 2007, ICS '07.
[20] Guojing Cong,et al. A framework for automated performance bottleneck detection , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[21] Tao Li,et al. Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[22] Emery D. Berger,et al. Coz: finding code that counts with causal profiling , 2015, USENIX Annual Technical Conference.
[23] Michael A. Frumkin,et al. Benchmarking Memory Performance with the Data Cube Operator , 2004, PDCS.
[24] S. Eranian. Perfmon2: a flexible performance monitoring interface for Linux , 2010 .
[25] Christoforos E. Kozyrakis,et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[26] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[27] Jack J. Dongarra,et al. EZTrace: A Generic Framework for Performance Analysis , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[28] François Trahay,et al. Runtime Function Instrumentation with EZTrace , 2012, Euro-Par Workshops.
[29] Weng-Fai Wong,et al. Dynamic cache contention detection in multi-threaded applications , 2011, VEE '11.
[30] Michael L. Scott,et al. False sharing and its effect on shared memory performance , 1993 .
[31] Vivien Quéma,et al. MemProf: A Memory Profiler for NUMA Multicore Systems , 2012, USENIX Annual Technical Conference.
[32] Jose Renau,et al. Analysis of PARSEC workload scalability , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[33] Dongmei Zhang,et al. Comprehending performance from real-world execution traces: a device-driver case , 2014, ASPLOS.
[34] Shan Lu,et al. Statistical debugging for real-world performance problems , 2014, OOPSLA.
[35] Julia L. Lawall,et al. Continuously measuring critical section pressure with the free-lunch profiler , 2014, OOPSLA.
[36] Stijn Eyerman,et al. Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[37] Thomas Rauber,et al. Trace-based Automatic Padding for Locality Improvement with Correlative Data Visualization Interface , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[38] Yu Luo,et al. Non-Intrusive Performance Profiling for Entire Software Stacks Based on the Flow Reconstruction Principle , 2016, OSDI.
[39] Stijn Eyerman,et al. Criticality stacks: identifying critical threads in parallel programs using synchronization behavior , 2013, ISCA.
[40] Robert Tappan Morris,et al. Locating cache performance bottlenecks using data profiling , 2010, EuroSys '10.
[41] François Trahay,et al. Selecting Points of Interest in Traces Using Patterns of Events , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[42] Xi Chen,et al. Cache contention and application performance prediction for multi-core systems , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[43] Yuriy Brun,et al. Mining precise performance-aware behavioral models from existing instrumentation , 2014, ICSE Companion.
[44] Chen Tian,et al. PREDATOR: predictive false sharing detection , 2014, PPoPP '14.
[45] M. ScholarWorks,et al. Cheetah : Detecting False Sharing Efficiently and Effectively , 2019 .
[46] Josef Weidendorfer,et al. Assessing cache false sharing effects by dynamic binary instrumentation , 2009, WBIA '09.
[47] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[48] Julia L. Lawall,et al. Fast and Portable Locking for Multicore Architectures , 2016, ACM Trans. Comput. Syst..
[49] Brad Fitzpatrick,et al. Distributed caching with memcached , 2004 .
[50] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[51] Yanbin Liu,et al. Detection of false sharing using machine learning , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).