CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures
暂无分享,去创建一个
Quan Chen | Minyi Guo | Zhiyi Huang | Quan Chen | M. Guo | Zhiyi Huang
[1] Tianzhou Chen,et al. Less reused filter: improving l2 cache performance via filtering less reused lines , 2009, ICS '09.
[2] Frédéric Wagner,et al. Hierarchical Work-Stealing , 2010, Euro-Par.
[3] Swann Perarnau,et al. Controlling cache utilization of HPC applications , 2011, ICS '11.
[4] Nir Shavit,et al. Non-blocking steal-half work queues , 2002, PODC '02.
[5] David Chase,et al. Dynamic circular work-stealing deque , 2005, SPAA '05.
[6] James Reinders,et al. Intel® threading building blocks , 2008 .
[7] Stephen L. Olivier,et al. Scheduling task parallelism on multi-socket multicore systems , 2011, ROSS '11.
[8] Robert D. Blumofe,et al. Executing multithreaded programs efficiently , 1995 .
[9] Doug Lea,et al. A Java fork/join framework , 2000, JAVA '00.
[10] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[11] Jens Palsberg,et al. Featherweight X10: a core calculus for async-finish parallelism , 2010, PPoPP '10.
[12] Michael Stumm,et al. Online performance analysis by statistical sampling of microprocessor performance counters , 2005, ICS '05.
[13] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[14] M. Berger,et al. Adaptive mesh refinement for hyperbolic partial differential equations , 1982 .
[15] Guy E. Blelloch,et al. Provably good multicore cache performance for divide-and-conquer algorithms , 2008, SODA '08.
[16] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[17] Hans-Peter Seidel,et al. Cache oblivious parallelograms in iterative stencil computations , 2010, ICS '10.
[18] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[19] Guy E. Blelloch,et al. The data locality of work stealing , 2000, SPAA.
[20] Michael Stumm,et al. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.
[21] Guy E. Blelloch,et al. Low depth cache-oblivious algorithms , 2010, SPAA '10.
[22] Guy E. Blelloch,et al. Scheduling threads for constructive cache sharing on CMPs , 2007, SPAA '07.
[23] Chia-Lin Yang,et al. Cache-aware task scheduling on multi-core architecture , 2010, Proceedings of 2010 International Symposium on VLSI Design, Automation and Test.
[24] Tao Yang,et al. A Comparison of Clustering Heuristics for Scheduling Directed Acycle Graphs on Multiprocessors , 1992, J. Parallel Distributed Comput..
[25] Wenguang Chen,et al. Maotai: View-Oriented Parallel Programming on CMT Processors , 2008, 2008 37th International Conference on Parallel Processing.
[26] Quan Chen,et al. CAB: Cache Aware Bi-tier Task-Stealing in Multi-socket Multi-core Architecture , 2011, 2011 International Conference on Parallel Processing.
[27] Quan Chen,et al. WATS: Workload-Aware Task Scheduling in Asymmetric Multi-core Architectures , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[28] Richard Cole,et al. Analysis of Randomized Work Stealing with False Sharing , 2011, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[29] Alejandro Duran,et al. The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.
[30] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.
[31] Lei Wang,et al. An adaptive task creation strategy for work-stealing scheduling , 2010, CGO '10.
[32] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[33] Yi Guo,et al. Work-first and help-first scheduling policies for async-finish task parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[34] Maged M. Michael,et al. Idempotent work stealing , 2009, PPoPP '09.
[35] Guy E. Blelloch,et al. Scheduling irregular parallel computations on hierarchical caches , 2011, SPAA '11.
[36] Mark Moir,et al. A dynamic-sized nonblocking work stealing deque , 2006, Distributed Computing.
[37] David R. Butenhof. Programming with POSIX threads , 1993 .
[38] Xiaoning Ding,et al. ULCC: a user-level facility for optimizing shared cache performance on multicores , 2011, PPoPP '11.