SCALABLE COMPUTING Practice and Experience
暂无分享,去创建一个
[1] André Rigland Brodtkorb,et al. The Graphics Processor as a Mathematical Coprocessor in MATLAB , 2008, 2008 International Conference on Complex, Intelligent and Software Intensive Systems.
[2] Luca Benini,et al. MPARM: Exploring the Multi-Processor SoC Design Space with SystemC , 2005, J. VLSI Signal Process..
[3] D. Marpe,et al. Video coding with H.264/AVC: tools, performance, and complexity , 2004, IEEE Circuits and Systems Magazine.
[4] Dean M. Tullsen,et al. Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[5] Jack Dongarra,et al. Some issues in dense linear algebra for multicore and special purpose architectures , 2008 .
[6] Naraig Manjikian. Multiprocessor enhancements of the SimpleScalar tool set , 2001, CARN.
[7] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[8] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[9] Jan Christian Meyer,et al. Latency Impact on Spin-Lock Algorithms for Modern Shared Memory Multiprocessors , 2008, 2008 International Conference on Complex, Intelligent and Software Intensive Systems.
[10] Victor V. Zyuban,et al. The energy complexity of register files , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).
[11] David R. Butenhof. Programming with POSIX threads , 1993 .
[12] André Seznec,et al. Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors , 2002, MICRO 35.
[13] Laxmi N. Bhuyan,et al. An Adaptive Submesh Allocation Strategy for Two-Dimensional Mesh Connected Systems , 1993, 1993 International Conference on Parallel Processing - ICPP'93.
[14] John Goodacre,et al. ARM MPCore; The streamlined and scalable ARM11 processor core , 2007, 2007 Asia and South Pacific Design Automation Conference.
[15] David F. Heidel,et al. An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[16] A. M. Abdullah,et al. Wireless lan medium access control (mac) and physical layer (phy) specifications , 1997 .
[17] Jianfeng Xu,et al. Fast integer-pel and fractional-pel motion estimation for H.264/AVC , 2006, J. Vis. Commun. Image Represent..
[18] Gerard J. M. Smit,et al. Mapping of DSP algorithms on the MONTIUM architecture , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[19] M. Butts,et al. A Structural Object Programming Model, Architecture, Chip and Tools for Reconfigurable Computing , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).
[20] T. N. Vijaykumar,et al. Reducing register ports for higher speed and lower energy , 2002, MICRO.
[21] Karam S. Chatha,et al. An ILP Formulation for System-Level Application Mapping on Network Processor Architectures , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.
[22] James H. Anderson,et al. A generic local-spin fetch-and-phi-based mutual exclusion algorithm , 2007, J. Parallel Distributed Comput..
[23] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[24] M. McCool. Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform , 2006 .
[25] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .
[26] Ieee Standards Board. IEEE Standard for local and metropolitan area networks : supplement to Integrated Services (IS) LAN Interface at the Medium Access Control (MAC) and Physical (PHY) layers : Managed Object Conformance (MOCS) Proforma , 1996 .
[27] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[28] Tughrul Arslan,et al. System-level Scheduling on Instruction Cell Based Reconfigurable Systems , 2006, Proceedings of the Design Automation & Test in Europe Conference.
[29] Peter Y. K. Cheung,et al. Analysis of yield loss due to random photolithographic defects in the interconnect structure of FPGAs , 2005, FPGA '05.
[30] Tughrul Arslan,et al. Efficient Implementation of Wireless Applications on Multi-core Platforms Based on Dynamically Reconfigurable Processors , 2008, 2008 International Conference on Complex, Intelligent and Software Intensive Systems.
[31] Olav Lysne,et al. Layered routing in irregular networks , 2006, IEEE Transactions on Parallel and Distributed Systems.
[32] Erik Hagersten,et al. Queue locks on cache coherent multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.
[33] Olav Lysne,et al. Routing-Contained Virtualization Based on Up*/Down* Forwarding , 2007, HiPC.
[34] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[35] Maged M. Michael,et al. Scalability of Atomic Primitives on Distributed Shared Memory Multiprocessors , 1994 .
[36] Wei-Chang Tsai,et al. A simple and efficient block motion estimation algorithm based on full-search array architecture , 2004, Signal Process. Image Commun..
[37] Hee Yong Youn,et al. Isomorphic Strategy for Processor Allocation in k-Ary n-Cube Systems , 2003, IEEE Trans. Computers.
[38] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[40] Antonio Robles,et al. Effective methodology for deadlock-free minimal routing in InfiniBand networks , 2002, Proceedings International Conference on Parallel Processing.
[41] Cong Fu,et al. The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors , 2005, CARN.
[42] Carl Ebeling,et al. Implementing an OFDM receiver on the RaPiD reconfigurable architecture , 2003, IEEE Transactions on Computers.
[43] Y. Danieli. Guide , 2005 .
[44] Leonel Sousa,et al. A Parallel Algorithm for Advanced Video Motion Estimation on Multicore Architectures , 2008, 2008 International Conference on Complex, Intelligent and Software Intensive Systems.
[45] Tughrul Arslan,et al. The Reconfigurable Instruction Cell Array , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[46] Vipul Gupta,et al. A flexible processor allocation strategy for mesh connected parallel systems , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[47] Christopher J. Hughes,et al. RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors , 2002, Computer.
[48] Allan Heydon,et al. System Description Language , 2006 .
[49] Edsger W. Dijkstra,et al. Solution of a problem in concurrent programming control , 1965, CACM.
[50] Sriram R. Vangal,et al. A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.
[51] Rafael Mayo,et al. Evaluation and tuning of the Level 3 CUBLAS for graphics processors , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[52] Sven-Arne Reinemo,et al. An Analysis of Connectivity and Yield for 2D Mesh Based NoC with Interconnect Router Failures , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.
[53] Frank Ghenassia,et al. Transaction Level Modeling with SystemC , 2005 .
[54] Stephen D. Brown,et al. Architecture of FPGAs and CPLDs: A Tutorial , 2000 .
[55] Hee Yong Youn,et al. Processor Scheduling and Allocation for 3D Torus Multicomputer Systems , 2000, IEEE Trans. Parallel Distributed Syst..
[56] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[57] Boon Shyang Lim. A Simplified High Definition Video Encoder Based on The STI CELL Multiprocessor , 2007 .
[58] Antonio Robles,et al. LASH-TOR: a generic transition-oriented routing algorithm , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..
[59] Rafael Mayo,et al. GLAME@lab: An M-script API for Linear Algebra Operations on Graphics Processors , 2008 .
[60] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[61] Jie Chen,et al. Efficient subtorus processor allocation in a multi-dimensional torus , 2005, Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05).
[62] Victor V. Zyuban,et al. Inherently Lower-Power High-Performance Superscalar Architectures , 2001, IEEE Trans. Computers.
[63] S. Asano,et al. The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..
[64] Po-Jen Chuang,et al. An Efficient Recognition-Complete Processor Allocation Strategy for k-ary n-cube Multiprocessors , 2000, IEEE Trans. Parallel Distributed Syst..
[65] Tughrul Arslan,et al. The Design of Multitasking Based Applications on Reconfigurable Instruction Cell Based Architectures , 2007, 2007 International Conference on Field Programmable Logic and Applications.
[66] Fan Wu,et al. Processor Allocation in the Mesh Multiprocessors Using the Leapfrog Method , 2003, IEEE Trans. Parallel Distributed Syst..
[67] Michael Burrows,et al. Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links , 1991, IEEE J. Sel. Areas Commun..
[68] Yahui Zhu,et al. Efficient Processor Allocation Strategie for Mesh-Connected Parallel Computers , 1992, J. Parallel Distributed Comput..
[69] Michael Gschwind. The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor , 2007, International Journal of Parallel Programming.
[70] Antonio Robles,et al. An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori , 2004, IEEE Computer Architecture Letters.
[71] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[72] Jiun-In Guo,et al. An Embedded Coherent-Multithreading Multimedia Processor and Its Programming Model , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[73] Daniel A. Brokenshire,et al. Introduction to the Cell Broadband Engine Architecture , 2007, IBM J. Res. Dev..