The Landscape of Parallel Computing Research: A View from Berkeley
暂无分享,去创建一个
Samuel Williams | John Shalf | Katherine Yelick | Kurt Keutzer | David A. Patterson | Joseph James Gebis | Krste Asanovic | William Plishker | Parry Husbands | Ras Bodik | Bryan Catanzaro | Bryan Catanzaro | K. Keutzer | D. Patterson | K. Asanović | R. Bodík | Joseph Gebis | P. Husbands | W. Plishker | J. Shalf | Samuel Williams | K. Yelick | Rastislav Bodík
[1] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[2] Allen Newell,et al. The PMS and ISP descriptive systems for computer structures , 1970, AFIPS '70 (Spring).
[3] J. Shaoul. Human Error , 1973, Nature.
[4] Carl Hewitt,et al. A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.
[5] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.
[6] J. Monaghan,et al. Shock simulation by the particle method SPH , 1983 .
[7] Barry H. Kantowitz,et al. Human Factors: Understanding People-System Relationships , 1983 .
[8] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.
[9] H. Massalin. Superoptimizer: a look at the smallest program , 1987, ASPLOS.
[10] Allan Porterfield,et al. The Tera computer system , 1990, ICS '90.
[11] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.
[12] Cherri M. Pancake,et al. Do parallel languages respond to the needs of scientific programmers? , 1990, Computer.
[13] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.
[14] Anantha P. Chandrakasan,et al. Low-power CMOS digital design , 1992 .
[15] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[16] W. Daniel Hillis,et al. The CM-5 Connection Machine: a scalable supercomputer , 1993, CACM.
[17] Guy L. Steele,et al. The High Performance Fortran Handbook , 1993 .
[18] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[19] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[20] David A. Patterson,et al. Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .
[21] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[22] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[23] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[24] Mark Horowitz,et al. Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.
[25] Katherine Yelick,et al. A Case for Intelligent RAM: IRAM , 1997 .
[26] John Wawrzynek,et al. Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).
[27] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[28] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[29] Randy Goebel,et al. Computational intelligence - a logical approach , 1998 .
[30] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[31] Kurt Keutzer,et al. Getting to the bottom of deep submicron , 1998, ICCAD '98.
[32] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[33] James Demmel,et al. A Supernodal Approach to Sparse Partial Pivoting , 1999, SIAM J. Matrix Anal. Appl..
[34] Shekhar Y. Borkar,et al. Design challenges of technology scaling , 1999, IEEE Micro.
[35] Erwin A. de Kock,et al. YAPI: application modeling for signal processing systems , 2000, Proceedings 37th Design Automation Conference.
[36] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[37] P.P. Gelsinger,et al. Microprocessors for the new millennium: Challenges, opportunities, and new frontiers , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).
[38] Albert Wang,et al. Hardware/software instruction set configurability for system-on-chip processors , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[39] Jeffrey S. Vetter,et al. Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.
[40] W. Dally,et al. Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[41] Dennis Sylvester,et al. Impact of small process geometries on microarchitectures in systems on a chip , 2001 .
[42] James Demmel,et al. Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.
[43] Henry Hoffmann,et al. A stream compiler for communication-exposed architectures , 2002, ASPLOS X.
[44] J. Demmel,et al. An updated set of basic linear algebra subprograms (BLAS) , 2002, TOMS.
[45] Michael Gschwind,et al. Optimizing pipelines for power and performance , 2002, MICRO.
[46] John Shalf,et al. The Cactus Framework and Toolkit: Design and Applications , 2002, VECPAR.
[47] Norman P. Jouppi,et al. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays , 2002, ISCA.
[48] James R. Goodman,et al. Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.
[49] Gilbert Wolrich,et al. The next generation of Intel IXP network processors , 2002 .
[50] Dustin Boswell,et al. Introduction to Support Vector Machines , 2002 .
[51] Jeffrey S. Vetter,et al. An Empirical Performance Evaluation of Scalable Scientific Applications , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[52] James Arthur Kohl,et al. A Component Architecture for High-Performance Computing , 2003 .
[53] Ahmed Seffah. Learning the ropes: human-centered design skills and patterns for software engineers' education , 2003, INTR.
[54] Jeffrey S. Vetter,et al. Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[55] Allan Hartstein,et al. Optimum Power/Performance Pipeline Depth , 2003, MICRO.
[56] Thomas R. Puzak,et al. Optimum power/performance pipeline depth , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[57] Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.
[58] Min Xu,et al. A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.
[59] Kurt Keutzer,et al. NP-Click: a productive software development approach for network processors , 2004, IEEE Micro.
[60] Matthias Gries,et al. Methods for evaluating and covering the design space during early design development , 2004, Integr..
[61] Krste Asanovic,et al. Power-optimal pipelining in deep submicron technology , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).
[62] Kunle Olukotun,et al. Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[63] Steve Leibson,et al. Engineering the complex SOC : fast, flexible design with configurable processors , 2004 .
[64] James D. Arthur,et al. What we should teach, but don't: proposal for cross pollinated HCI-SE curriculum , 2004, 34th Annual Frontiers in Education, 2004. FIE 2004..
[65] Bradford L. Chamberlain,et al. The cascade high productivity language , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..
[66] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[67] David A. Patterson,et al. Latency lags bandwith , 2004, CACM.
[68] Laxmikant V. Kalé,et al. Performance and modularity benefits of message-driven execution , 2004, J. Parallel Distributed Comput..
[69] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[70] K. Keutzer,et al. Automated Task Allocation for Network Processors , 2004 .
[71] Jim Gray,et al. A Minute with Nsort on a 32P NEC Windows Itanium2 Server , 2004 .
[72] Chris Rowen,et al. Engineering the Complex SOC , 2004 .
[73] K. Olukotun,et al. Transactional Memory Coherence and Consistency ( TCC ) , 2004 .
[74] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[75] Jeffrey M. Arnold,et al. S5: the architecture and development flow of a software configurable processor , 2005, Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005..
[76] Joel S. Emer,et al. The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.
[77] Shekhar Y. Borkar,et al. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.
[78] J. Shalf,et al. Understanding ultra-scale application communication requirements , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..
[79] G. Vahala,et al. 3D Entropic Lattice Boltzmann Simulations of 3D Navier-Stokes Turbulence , 2005 .
[80] Steven J. Deitz,et al. High-level programming language abstractions for advanced and dynamic parallel computations , 2005 .
[81] P. K. Dubey,et al. Recognition, Mining and Synthesis Moves Comp uters to the Era of Tera , 2005 .
[82] David A. Patterson,et al. Latency Lags Bandwidth , 2005, ICCD.
[83] Jeffrey C. Carver,et al. Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[84] Leonid Oliker,et al. Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[85] Kunle Olukotun,et al. ATLAS: A Scalable Emulator for Transactional Parallel Systems , 2005 .
[86] Armando Solar-Lezama,et al. Programming by sketching for bit-streaming programs , 2005, PLDI '05.
[87] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[88] Rodric M. Rabbah,et al. A Productive Programming Environment for Stream Computing , 2005 .
[89] David A. Bader,et al. Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).
[90] David A. Patterson,et al. RAMP: research accelerator for multiple processors - a community vision for a shared experimental parallel HW/SW platform , 2006, ISPASS.
[91] Babak Falsafi,et al. ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development , 2006 .
[92] Li-Shiuan Peh,et al. A Statistical Traffic Model for On-Chip Interconnection Networks , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.
[93] William J. Dally,et al. Multi-Core for HPC: breakthrough or breakdown? , 2006, SC.
[94] Mendel Rosenblum. Impact of virtualization on computer architecture and operating systems , 2006, ASPLOS XII.
[95] Kurt Keutzer,et al. Building ASIPs: The Mescal Methodology , 2006 .
[96] Jonathan Rose,et al. Measuring the Gap Between FPGAs and ASICs , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[97] C. H. Flood,et al. The Fortress Language Specification , 2007 .
[98] Wi N Dows. FLIGHT DATA RECORDER FOR , 2007 .
[99] Vivek Sarkar,et al. An Experiment in Measuring the Productivity of Three Parallel Programming Languages , 2007 .
[100] Christoforos E. Kozyrakis,et al. RAMP: Research Accelerator for Multiple Processors , 2007, IEEE Micro.
[101] Edward A. Lee,et al. The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View , 2008 .
[102] R. V. D. Wijngaart. NAS Parallel Benchmarks Version 2.4 , 2022 .