Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis
暂无分享,去创建一个
[1] Josep Torrellas,et al. Speculative synchronization: applying thread-level speculation to explicitly parallel applications , 2002, ASPLOS X.
[2] Zhao Zhang,et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[3] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Björn Lisper,et al. Data caches in multitasking hard real-time systems , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.
[5] Björn Lisper,et al. Data cache locking for tight timing calculations , 2007, TECS.
[6] Jianjun Li,et al. Providing fairness on shared-memory multiprocessors via process scheduling , 2012, SIGMETRICS '12.
[7] Mary Lou Soffa,et al. Contention aware execution: online contention detection and response , 2010, CGO '10.
[8] Chen Ding,et al. Defensive loop tiling for shared cache , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[9] Yang Yang,et al. Automatic Library Generation for BLAS3 on GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[10] Yale N. Patt,et al. Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs , 2008, ASPLOS.
[11] Angela C. Sodan,et al. Predicting cache needs and cache sensitivity for applications in cloud computing on CMP servers with configurable caches , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[12] Onur Mutlu,et al. A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[13] Xiaobing Feng,et al. An empirical model for predicting cross-core performance interference on multicore processors , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[14] Lingjia Tang,et al. Compiling for niceness: mitigating contention for QoS in warehouse scale computers , 2012, CGO '12.
[15] Francisco J. Cazorla,et al. Multicore Resource Management , 2008, IEEE Micro.
[16] Yan Solihin,et al. Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.
[17] Jingling Xue,et al. Query-directed adaptive heap cloning for optimizing compilers , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[18] David A. Wood,et al. IPC Considered Harmful for Multiprocessor Workloads , 2006, IEEE Micro.
[19] S. Kim,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[20] Lingjia Tang,et al. Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures , 2011, EXADAPT '11.
[21] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[22] Hongtao Yu,et al. Level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines of code , 2010, CGO '10.
[23] Lingjia Tang,et al. The impact of memory subsystem resource sharing on datacenter applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[24] Xiao Zhang,et al. Towards practical page coloring-based multicore cache management , 2009, EuroSys '09.
[25] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .
[26] Nathan Clark,et al. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications , 2010, ISCA.
[27] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[28] David Eklov,et al. Bandwidth Bandit: Quantitative characterization of memory contention , 2012, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[29] Gabriel H. Loh,et al. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.
[30] Lingjia Tang,et al. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.
[31] Mary Lou Soffa,et al. DraMon: Predicting memory bandwidth usage of multi-threaded programs with high accuracy and low overhead , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[32] Jingling Xue,et al. On-demand dynamic summary-based points-to analysis , 2012, CGO '12.
[33] Tong Li,et al. Using OS Observations to Improve Performance in Multicore Systems , 2008, IEEE Micro.
[34] Xipeng Shen,et al. Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors , 2010, HiPEAC.
[35] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[36] Yuxiong He,et al. The Cilkview scalability analyzer , 2010, SPAA '10.
[37] Alexei Alexandrov. Parallelization Made Easier with Intel PerformanceTuning Utility , 2007 .
[38] Lingjia Tang,et al. Directly characterizing cross core interference through contention synthesis , 2011, HiPEAC.
[39] Maged M. Michael. Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.
[40] Mahmut T. Kandemir,et al. A case for integrated processor-cache partitioning in chip multiprocessors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[41] Lingjia Tang,et al. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[42] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[43] Dongrui Fan,et al. Extendable pattern-oriented optimization directives , 2012, International Symposium on Code Generation and Optimization (CGO 2011).
[44] Aamer Jaleel,et al. Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[45] Xipeng Shen,et al. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? , 2010, PPoPP '10.
[46] Mikko H. Lipasti,et al. Redeeming IPC as a performance metric for multithreaded programs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[47] Lingjia Tang,et al. Protean Code: Achieving Near-Free Online Code Transformations for Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[48] Lingjia Tang,et al. Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.
[49] I. Jolliffe. Principal Components in Regression Analysis , 1986 .
[50] Jie Chen,et al. Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[51] Luiz André Barroso,et al. The Case for Energy-Proportional Computing , 2007, Computer.
[52] Matthias S. Müller,et al. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[53] Pen-Chung Yew,et al. On mitigating memory bandwidth contention through bandwidth-aware scheduling , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[54] Donald Eugene. Farrar,et al. Multicollinearity in Regression Analysis; the Problem Revisited , 2011 .
[55] Michael D. Smith,et al. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).