Advanced Computer Architecture and Parallel Processing

1. Introduction to Advanced Computer Architecture and Parallel Processing.1.1 Four Decades of Computing.1.2 Flynn's Taxonomy of Computer Architecture.1.3 SIMD Architecture.1.4 MIMD Architecture.1.5 Interconnection Networks.1.6 Chapter Summary.Problems.References.2. Multiprocessors Interconnection Networks.2.1 Interconnection Networks Taxonomy.2.2 Bus-Based Dynamic Interconnection Networks.2.3 Switch-Based Interconnection Networks.2.4 Static Interconnection Networks.2.5 Analysis and Performance Metrics.2.6 Chapter Summary.Problems.References.3. Performance Analysis of Multiprocessor Architecture.3.1 Computational Models.3.2 An Argument for Parallel Architectures.3.3 Interconnection Networks Performance Issues.3.4 Scalability of Parallel Architectures.3.5 Benchmark Performance.3.6 Chapter Summary.Problems.References.4. Shared Memory Architecture.4.1 Classification of Shared Memory Systems.4.2 Bus-Based Symmetric Multiprocessors.4.3 Basic Cache Coherency Methods.4.4 Snooping Protocols.4.5 Directory Based Protocols.4.6 Shared Memory Programming.4.7 Chapter Summary.Problems.References.5. Message Passing Architecture.5.1 Introduction to Message Passing.5.2 Routing in Message Passing Networks.5.3 Switching Mechanisms in Message Passing.5.4 Message Passing Programming Models.5.5 Processor Support for Message Passing.5.6 Example Message Passing Architectures.5.7 Message Passing Versus Shared Memory Architectures.5.8 Chapter Summary.Problems.References.6. Abstract Models.6.1 The PRAM Model and Its Variations.6.2 Simulating Multiple Accesses on an EREW PRAM.6.3 Analysis of Parallel Algorithms.6.4 Computing Sum and All Sums.6.5 Matrix Multiplication.6.6 Sorting.6.7 Message Passing Model.6.8 Leader Election Problem.6.9 Leader Election in Synchronous Rings.6.10 Chapter Summary.Problems.References.7. Network Computing.7.1 Computer Networks Basics.7.2 Client/Server Systems.7.3 Clusters.7.4 Interconnection Networks.7.5 Cluster Examples.7.6 Grid Computing.7.7 Chapter Summary.Problems.References.8. Parallel Programming in the Parallel Virtual Machine.8.1 PVM Environment and Application Structure.8.2 Task Creation.8.3 Task Groups.8.4 Communication Among Tasks.8.5 Task Synchronization.8.6 Reduction Operations.8.7 Work Assignment.8.8 Chapter Summary.Problems.References.9. Message Passing Interface (MPI).9.1 Communicators.9.2 Virtual Topologies.9.3 Task Communication.9.4 Synchronization.9.5 Collective Operations.9.6 Task Creation.9.7 One-Sided Communication.9.8 Chapter Summary.Problems.References.10 Scheduling and Task Allocation.10.1 The Scheduling Problem.10.2 Scheduling DAGs without Considering Communication.10.3 Communication Models.10.4 Scheduling DAGs with Communication.10.5 The NP-Completeness of the Scheduling Problem.10.6 Heuristic Algorithms.10.7 Task Allocation.10.8 Scheduling in Heterogeneous Environments.Problems.References.

[1]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[2]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[3]  David A. Patterson,et al.  Truth in SPEC benchmarks , 1995, CARN.

[4]  V. Sarkar,et al.  Automatic partitioning of a program dependence graph into parallel tasks , 1991, IBM J. Res. Dev..

[5]  James R. Goodman Using cache memory to reduce processor-memory traffic , 1998, ISCA '98.

[6]  Paul Pierce The NX Message Passing Interface , 1994, Parallel Comput..

[7]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[8]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[9]  Olof Johnell,et al.  Orthopaedic treatment of displaced femoral neck fractures in elderly patients , 2005, Disability and rehabilitation.

[10]  C. Robinson,et al.  Treatment of Displaced Intracapsular Hip Fractures with Total Hip Arthroplasty: Comparison of Primary Arthroplasty with Early Salvage Arthroplasty After Failed Internal Fixation , 2002, The Journal of bone and joint surgery. American volume.

[11]  Vijay P. Kumar,et al.  Analyzing Scalability of Parallel Algorithms and Architectures , 1994, J. Parallel Distributed Comput..

[12]  Subrata Dasgupta,et al.  Computer Architecture: A Modern Synthesis , 1988 .

[13]  B. Bierbaum,et al.  Ceramic-on-Ceramic Bearings in Total Hip Arthroplasty , 2002, Clinical orthopaedics and related research.

[14]  T Lawrence,et al.  Intracapsular hip fractures in end-stage renal failure. , 2006, Injury.

[15]  T. C. Hu Parallel Sequencing and Assembly Line Problems , 1961 .

[16]  N. Meyers,et al.  H = W. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Laurent Sedel,et al.  Alumina-on-Alumina Total Hip Arthroplasty: A Minimum 18.5-Year Follow-up Study , 2002, The Journal of bone and joint surgery. American volume.

[18]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[19]  Sari Ponzer,et al.  Comparison of internal fixation with total hip replacement for displaced femoral neck fractures. Randomized, controlled trial performed at four years. , 2005, The Journal of bone and joint surgery. American volume.

[20]  Christopher S. Cooper Local area network (LAN) , 2003 .

[21]  Hesham El-Rewini,et al.  Parallax: a tool for parallel program scheduling , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[22]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[23]  Kaivalya M. Dixit,et al.  The SPEC benchmarks , 1991, Parallel Comput..

[24]  Yves Robert,et al.  Evaluating speedups on distributed memory architectures , 1989, Parallel Comput..

[25]  M ROBURN,et al.  Femoral neck fractures. , 1964, Lancet.

[26]  Tilak Agerwala,et al.  Performance Analysis of Future Shared Storage Systems , 1984, IBM J. Res. Dev..

[27]  Emil H Schemitsch,et al.  Operative management of displaced femoral neck fractures in elderly patients. An international survey. , 2005, The Journal of bone and joint surgery. American volume.

[28]  Benjamin W. Wah,et al.  A Contention-Based Bus-Control Scheme for Multiprocessor Systems , 1991, IEEE Trans. Computers.

[29]  Jang-Ping Sheu,et al.  Performance Analysis of Multiple Bus Interconnection Networks with Hierarchical Requesting Model , 1991, IEEE Trans. Computers.

[30]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[31]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[32]  R. Ceroni,et al.  THA Ceramic-Ceramic Coupling: The Evaluation of the Dislocation Rate with Bigger Heads , 2004 .

[33]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[34]  Harold N. Gabow,et al.  An Almost-Linear Algorithm for Two-Processor Scheduling , 1982, JACM.

[35]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[36]  Virginia Mary Lo,et al.  Heuristic Algorithms for Task Assignment in Distributed Systems , 1988, IEEE Trans. Computers.

[37]  Mihalis Yannakakis,et al.  Scheduling Interval-Ordered Tasks , 1979, SIAM J. Comput..

[38]  Anthony Skjellum,et al.  Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[39]  Dharma P. Agrawal,et al.  Performance of multiprocessor interconnection networks , 1989, Computer.

[40]  Hesham H. Ali,et al.  On the Intractability of Task Allocation in Distributed Systems , 1994, Parallel Process. Lett..

[41]  Al Geist,et al.  Network-based concurrent computing on the PVM system , 1992, Concurr. Pract. Exp..

[42]  Vipin Kumar,et al.  Performance Properties of Large Scale Parallel Systems , 1993, J. Parallel Distributed Comput..

[43]  I. Foster,et al.  The grid grows up , 2003, IEEE Internet Computing.

[44]  Laxmi N. Bhuyan Guest Editor's Introduction Interconnection Networks for Parallel and Distributed Processing , 1987, Computer.

[45]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[46]  S. G. Zaky,et al.  Communication performance in multiple-bus systems , 1988, IEEE Trans. Computers.

[47]  Laurent Sedel,et al.  Ceramics in total hip replacement. , 2005, Clinical orthopaedics and related research.

[48]  Krishnan Padmanabhan,et al.  Performance of the Direct Binary n-Cube Network for Multiprocessors , 1989, IEEE Trans. Computers.

[49]  Jon Crowcroft Metropolitan area network (MAN) , 2003 .

[50]  Ernest J. H. Chang,et al.  An improved algorithm for decentralized extrema-finding in circular configurations of processes , 1979, CACM.

[51]  M. Fujii,et al.  Optimal Sequencing of Two Equivalent Processors , 1969 .

[52]  Samuel T. Chanson,et al.  Process groups and group communications: classifications and requirements , 1990, Computer.

[53]  Niv Ahituv,et al.  SPEC as a Performance Evaluation Measure , 1995, Computer.

[54]  S. Jacobsen,et al.  Alternative materials to improve total hip replacement tribology. , 2003 .

[55]  Ravi Sethi,et al.  Scheduling Graphs on Two Processors , 1976, SIAM J. Comput..

[56]  C. R. Jesshope,et al.  High performance communications in processor networks , 1989, ISCA '89.

[57]  Harold S. Stone,et al.  Multiprocessor Scheduling with the Aid of Network Flow Algorithms , 1977, IEEE Transactions on Software Engineering.

[58]  Hee Yong Youn,et al.  A Comprehensive Performance Evaluation of Crossbar Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[59]  K. Gurusamy,et al.  Internal fixation versus arthroplasty for intracapsular proximal femoral fractures in adults. , 2006, The Cochrane database of systematic reviews.

[60]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[61]  Ian Foster,et al.  The Grid: A New Infrastructure for 21st Century Science , 2002 .

[62]  Michael J Gardner,et al.  Osteoporotic femoral neck fractures: management and current controversies. , 2004, Instructional course lectures.

[63]  Gordon Bell Why there won't be apps: The problem with MPPs , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[64]  Daniel H. Linder,et al.  An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes , 1994, IEEE Trans. Computers.

[65]  James E. Smith,et al.  Characterizing computer performance with a single number , 1988, CACM.