Automatic Locality Exploitation in the Codelet Model
暂无分享,去创建一个
Long Zheng | Minyi Guo | Guang R. Gao | Joshua Suetterlein | Chen Chen | Yao Wu | Joshua D. Suetterlein | G. Gao | M. Guo | Yao Wu | Cheng Chen | Long Zheng
[1] Jack B. Dennis,et al. Fresh Breeze: a multiprocessor chip architecture guided by modular programming principles , 2003, CARN.
[2] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[3] Guang R. Gao,et al. TiNy threads: a thread virtual machine for the Cyclops64 cellular architecture , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[4] Chuck Pheatt,et al. Intel® threading building blocks , 2008 .
[5] Albert Cohen,et al. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs , 2012, TACO.
[6] Guang R. Gao,et al. Optimized Dense Matrix Multiplication on a Many-Core Architecture , 2010, Euro-Par.
[7] Rishi Khan,et al. Towards a codelet-based runtime for exascale computing: position paper , 2012, EXADAPT '12.
[8] Tse-Yun Feng,et al. A Vertically Layered Allocation Scheme for Data Flow Systems , 1991, J. Parallel Distributed Comput..
[9] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.
[10] Barbara M. Chapman,et al. Enabling locality-aware computations in OpenMP , 2010, Sci. Program..
[11] Eduard Ayguadé,et al. Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..
[12] Guang R. Gao,et al. ParalleX: A Study of A New Parallel Computation Model , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[13] Jack B. Dennis,et al. Data Flow Supercomputers , 1980, Computer.
[14] Guang R. Gao,et al. Minimum register instruction sequence problem: revisiting optimal code generation for DAGs , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[15] David E. Culler,et al. Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[16] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[17] Haoqiang Jin,et al. Enabling locality-aware computations in OpenMP , 2010 .
[18] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[19] Daniel A. Orozco,et al. Energy efficient tiling on a Many-Core Architecture , 2011 .
[20] V. Sarkar,et al. Collective Loop Fusion for Array Contraction , 1992, LCPC.
[21] Quan Chen,et al. CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures , 2012, ICS '12.
[22] Theo Ungerer,et al. Asynchrony in Parallel Computing: From Dataflow to Multithreading , 2001, Scalable Comput. Pract. Exp..
[23] Katherine Yelick,et al. Introduction to UPC and Language Specification , 2000 .
[24] Guang R. Gao,et al. Earth: an efficient architecture for running threads , 1999 .
[25] Michael Haupt,et al. Maxine: An approachable virtual machine for, and in, java , 2013, TACO.
[26] Ian Watson,et al. The Manchester prototype dataflow computer , 1985, CACM.
[27] Vipin Kumar,et al. Multilevel Algorithms for Multi-Constraint Graph Partitioning , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[28] Jack B. Dennis,et al. First version of a data flow procedure language , 1974, Symposium on Programming.
[29] Rob C. Knauerhase,et al. For extreme parallelism, your OS is Sooooo last-millennium , 2012, HotPar'12.