Software challenges in extreme scale systems
暂无分享,去创建一个
[1] Carl Wunsch,et al. Practical global oceanic state estimation , 2007 .
[2] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[3] Yi Guo,et al. Work-first and help-first scheduling policies for async-finish task parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[4] Franz Franchetti,et al. Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform , 2006, SC.
[5] Vivek Sarkar,et al. Multi-core Implementations of the Concurrent Collections Programming Model , 2008 .
[6] Shujia Zhou,et al. Application controlled parallel asynchronous IO , 2006, SC.
[7] Michael Metcalf,et al. Fortran 90 Explained , 1990 .
[8] Nathan R. Tallent,et al. Effective performance measurement and analysis of multithreaded applications , 2009, PPoPP '09.
[9] Tong Li,et al. Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[10] Charles A. Zukowski,et al. CMOS transistor sizing for minimization of energy-delay product , 1996, Proceedings of the Sixth Great Lakes Symposium on VLSI.
[11] Keshav Pingali,et al. Compiler research: the next 50 years , 2009, CACM.
[12] Rajeev Thakur,et al. Formal verification of practical MPI programs , 2009, PPoPP '09.
[13] T. Inglett,et al. Designing a Highly-Scalable Operating System: The Blue Gene/L Story , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[14] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[15] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[16] John Glauert,et al. SISAL: streams and iteration in a single assignment language. Language reference manual, Version 1. 2. Revision 1 , 1985 .
[17] Samuel Lang,et al. GIGA+: scalable directories for shared file systems , 2007, PDSW '07.
[18] Carlos Maltzahn,et al. Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.
[19] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[20] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.
[21] John Shalf,et al. Cactus Framework: Black Holes to Gamma Ray Bursts , 2007, ArXiv.
[22] Jonathan Walpole,et al. Introducing technology into the Linux kernel: a case study , 2008, OPSR.
[23] Leonid Oliker,et al. Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[24] Ian T. Foster,et al. Distant I/O: one-sided access to secondary storage on remote processors , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).
[25] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[26] Jack B. Dennis,et al. Data Flow Supercomputers , 1980, Computer.
[27] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[28] John A. Gunnels,et al. Petascale computing with accelerators , 2009, PPoPP '09.
[29] Robert H. Halstead,et al. MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.
[30] James R. Larus,et al. Transactional Memory , 2006, Transactional Memory.
[31] Tao Yang,et al. The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[32] E.A. Lee,et al. Synchronous data flow , 1987, Proceedings of the IEEE.
[33] Vivek Sarkar,et al. Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .
[34] Vivek Sarkar,et al. Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement , 2009, LCPC.
[35] V. Sarkar,et al. Automatic partitioning of a program dependence graph into parallel tasks , 1991, IBM J. Res. Dev..
[36] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[37] Vivek Sarkar,et al. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization , 2008, ICS '08.
[38] Guy E. Blelloch,et al. A provable time and space efficient implementation of NESL , 1996, ICFP '96.
[39] Anwar Ghuloum. Ct: channelling NeSL and SISAL in C++ , 2007, CUFP '07.
[40] Vivek Sarkar,et al. Phaser accumulators: A new reduction construct for dynamic parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[41] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[42] Ronald Minnich,et al. Right-weight kernels: an off-the-shelf alternative to custom light-weight kernels , 2006, OPSR.
[43] Jason Duell,et al. Productivity and performance using partitioned global address space languages , 2007, PASCO '07.
[44] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[45] E. Birney,et al. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.
[46] Ani Thakar. Lessons Learned from the SDSS Catalog Archive Server , 2008, Computing in Science & Engineering.
[47] Nathan R. Tallent,et al. Binary analysis for measurement and attribution of program performance , 2009, PLDI '09.
[48] Bryan Veal,et al. Performance scalability of a multi-core web server , 2007, ANCS '07.
[49] Yu Ma,et al. Empowering distributed workflow with the data capacitor: maximizing lustre performance across the wide area network , 2007, SOCP '07.
[50] Seth Copen Goldstein,et al. Retrospective: active messages: a mechanism for integrating computation and communication , 1998, ISCA '98.
[51] Robert H. Halstead,et al. Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.
[52] Kenneth E. Iverson,et al. A programming language , 1899, AIEE-IRE '62 (Spring).
[53] Henry Hoffmann,et al. A stream compiler for communication-exposed architectures , 2002, ASPLOS X.
[54] Vivek Sarkar,et al. Automatic selection of high-order transformations in the IBM XL FORTRAN compilers , 1997, IBM J. Res. Dev..
[55] Vivek Sarkar,et al. Chunking parallel loops in the presence of synchronization , 2009, ICS.