Analysis and approximation of optimal co-scheduling on cmp
暂无分享,去创建一个
[1] Peter J. Denning,et al. Thrashing: its causes and prevention , 1968, AFIPS Fall Joint Computing Conference.
[2] Xipeng Shen,et al. Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors , 2010, HiPEAC.
[3] Margo I. Seltzer,et al. Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design , 2005, USENIX Annual Technical Conference, General Track.
[4] Jack Edmonds,et al. Maximum matching and a polyhedron with 0,1-vertices , 1965 .
[5] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[6] Chen Yang,et al. A cost-driven compilation framework for speculative parallelization of sequential programs , 2004, PLDI '04.
[7] Rajiv Gupta,et al. Copy or Discard execution model for speculative parallelization on multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[8] G. Edward Suh,et al. A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[9] M TullsenDean,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .
[10] Xipeng Shen,et al. A study on optimally co-scheduling jobs of different lengths on chip multiprocessors , 2009, CF '09.
[11] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[12] Chen Ding,et al. Miss Rate Prediction Across Program Inputs and Cache Configurations , 2007, IEEE Transactions on Computers.
[13] Tong Li,et al. Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin , 2009, PPoPP '09.
[14] J. N. Amaral,et al. Benchmark Design for Robust Profile-Directed Optimization , 2007 .
[15] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[16] Xipeng Shen,et al. Exploration of the Influence of Program Inputs on CMP Co-scheduling , 2008, Euro-Par.
[17] Chen Ding,et al. Locality phase prediction , 2004, ASPLOS XI.
[18] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[19] Chen Ding,et al. Array regrouping and structure splitting using whole-program reference affinity , 2004, PLDI '04.
[20] Feng Mao,et al. Cross-Input Learning and Discriminative Prediction in Evolvable Virtual Machines , 2009, 2009 International Symposium on Code Generation and Optimization.
[21] G. Edward Suh,et al. Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.
[22] Kristof Beyls,et al. Reuse Distance as a Metric for Cache Behavior. , 2001 .
[23] Yan Solihin,et al. Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.
[24] Srihari Makineni,et al. Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[25] Chen Ding,et al. Locality approximation using time , 2007, POPL '07.
[26] John M. Mellor-Crummey,et al. Cross-architecture performance predictions for scientific applications using parameterized models , 2004, SIGMETRICS '04/Performance '04.
[27] Harold S. Stone,et al. Footprints in the cache , 1986, SIGMETRICS '86/PERFORMANCE '86.
[28] Dean M. Tullsen,et al. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor , 2002, SIGMETRICS '02.
[29] Dean M. Tullsen,et al. Initial observations of the simultaneous multithreading Pentium 4 processor , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[30] Xipeng Shen,et al. Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? , 2010, CC.
[31] Dean M. Tullsen,et al. Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture , 2008, HiPEAC.
[32] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[33] Dean M. Tullsen,et al. Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[34] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[35] G. Edward Suh,et al. Analytical cache models with applications to cache partitioning , 2001, ICS '01.
[36] Xipeng Shen,et al. Scalable Implementation of Efficient Locality Approximation , 2008, LCPC.
[37] William J. Cook,et al. Computing Minimum-Weight Perfect Matchings , 1999, INFORMS J. Comput..
[38] Sandhya Dwarkadas,et al. Compatible phase co-scheduling on a CMP of multi-threaded processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[39] Won-Taek Lim,et al. Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[40] Chen Ding,et al. Software behavior oriented parallelization , 2007, PLDI '07.
[41] Sanjay Mehrotra,et al. On the Implementation of a Primal-Dual Interior Point Method , 1992, SIAM J. Optim..
[42] Michael D. Smith,et al. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[43] David A. Padua,et al. Compile-Time Based Performance Prediction , 1999, LCPC.
[44] Yan Solihin,et al. Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in DSM Multiprocessors , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[45] Jie Chen,et al. Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[46] S. Kim,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[47] Erik Hagersten,et al. A statistical multiprocessor cache model , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[48] Alex Settle,et al. Architectural Support for Enhanced SMT Job Scheduling , 2004, IEEE PACT.
[49] Yin Zhang,et al. Solving large-scale linear programs by interior-point methods under the Matlab ∗ Environment † , 1998 .
[50] Rudolf Eigenmann,et al. Speculative thread decomposition through empirical optimization , 2007, PPoPP.
[51] Sivarama P. Dandamudi. Parallel and Cluster Systems , 2003 .
[52] Susan J. Eggers,et al. Thread-Sensitive Scheduling for SMT Processors , 2000 .
[53] Gurindar S. Sohi,et al. Task selection for a multiscalar processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[54] Zhao Zhang,et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[55] Ian Pratt,et al. Hyper-Threading Aware Process Scheduling Heuristics , 2005, USENIX Annual Technical Conference, General Track.
[56] Steve Carr,et al. Feedback-directed memory disambiguation through store distance analysis , 2006, ICS '06.
[57] Yutao Zhong,et al. Predicting whole-program locality through reuse distance analysis , 2003, PLDI '03.
[58] Alan Jay Smith,et al. On the effectiveness of set associative page mapping and its application to main memory management , 1976, ICSE '76.
[59] Michael Stumm,et al. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.
[60] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .
[61] Xipeng Shen,et al. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? , 2010, PPoPP '10.