Exploiting fine-grain concurrency: Analytical insights in superscalar processor design

..... ................................... ................... ............................................. . xxi CHAPTER

[1]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[2]  David J. Lilja,et al.  Comparing Parallelism Extraction Techniques: Superscalar Processors, Pipelined Processors, and Multiprocessors , 1990, ICPP.

[3]  Michael J. Flynn,et al.  Representation of Concurrency with Ordering Matrices , 1973, IEEE Transactions on Computers.

[4]  Yoichi Muraoka,et al.  On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.

[5]  Leonard W. Cotten Circuit implementation of high-speed pipeline systems , 1965, AFIPS '65 (Fall, part I).

[6]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[7]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.

[8]  Bradley Kevin Fawcett Maximal Clocking Rates for Pipelined Digital Systems , 1975 .

[9]  Alexandru Nicolau,et al.  Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies , 1989, IEEE Trans. Computers.

[10]  Garold Stephen Tjaden Representation and detection of concurrency using ordering-matrices. , 1972 .

[11]  Peter J. Denning,et al.  Virtual memory , 1970, CSUR.

[12]  Yale N. Patt,et al.  HPS, a new microarchitecture: rationale and introduction , 1985, MICRO 18.

[13]  Thomas R. Gross,et al.  Optimizing delayed branches , 1982, MICRO 15.

[14]  James E. Smith,et al.  Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Transactions on Computers.

[15]  Robert G. Wedig Detection of concurrency in directly executed language instruction streams , 1982 .

[16]  Arvind,et al.  The U-Interpreter , 1982, Computer.

[17]  Hwa C. Torng,et al.  An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors , 1986, IEEE Transactions on Computers.

[18]  H. T. Kung Why systolic architectures? , 1982, Computer.

[19]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.

[20]  Gurindar S. Sohi,et al.  Tradeoffs in instruction format design for horizontal architectures , 1989, ASPLOS III.

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  Michael J. Flynn,et al.  Pipelining of Arithmetic Functions , 1972, IEEE Trans. Computers.

[23]  R. Karp,et al.  Properties of a model for parallel computations: determinacy , 1966 .

[24]  J. E. Thornton Design of a Computer: The Control Data 6600 , 1970 .

[25]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[26]  Alexandru Nicolau,et al.  Measuring the Parallelism Available for Very Long Instruction Word Architectures , 1984, IEEE Transactions on Computers.

[27]  Jean-Loup Baer,et al.  Legality and Other Properties of Graph Models of Computations , 1970, JACM.

[28]  Janak H. Patel,et al.  Improving the Throughput of a Pipeline by Insertion of Delays , 1976, ISCA.

[29]  Andrew R. Pleszkun,et al.  Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[30]  Michael J. Flynn,et al.  Detection and Parallel Execution of Independent Instructions , 1970, IEEE Transactions on Computers.

[31]  George Cybenko,et al.  Supercomputer performance evaluation and the Perfect Benchmarks , 1990, ICS '90.

[32]  Monica S. Lam,et al.  Architecture and Compiler Tradeoffs for a Long Instruction Word Microprocessor , 1989, ASPLOS.

[33]  Peter Y.-T. Hsu,et al.  Highly concurrent scalar processing , 1986, ISCA '86.

[34]  Constantine Demetrios Polychronopoulos On program restructuring, scheduling, and communication for parallel processor systems , 1986 .

[35]  James E. Smith,et al.  A study of scalar compilation techniques for pipelined supercomputers , 1987, ASPLOS.

[36]  Edward M. Riseman,et al.  Percolation of Code to Enhance Parallel Dispatching and Execution , 1972, IEEE Transactions on Computers.

[37]  Gurindar S. Sohi,et al.  Instruction issue logic for high-performance, interruptable pipelined processors , 1987, ISCA '87.

[38]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[39]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[40]  James E. Smith,et al.  Optimal Pipelining in Supercomputers , 1986, ISCA.

[41]  Raymond E. Miller,et al.  A Comparison of Some Theoretical Models of Parallel Computation , 1973, IEEE Transactions on Computers.

[42]  Michael D. Smith,et al.  Limits on multiple instruction issue , 1989, ASPLOS III.

[43]  Norman P. Jouppi,et al.  Available instruction-level parallelism for superscalar and superpipelined machines , 1989, ASPLOS III.

[44]  Alexandru Nicolau,et al.  Loop Quantization: A Generalized Loop Unwinding Technique , 1988, J. Parallel Distributed Comput..

[45]  Henry M. Levy,et al.  An evaluation of branch architectures , 1987, ISCA '87.

[46]  Yale N. Patt,et al.  Checkpoint Repair for High-Performance Out-of-Order Execution Machines , 1987, IEEE Transactions on Computers.

[47]  Alexandru Nicolau,et al.  Uniform Parallelism Exploitation in Ordinary Programs , 1985, ICPP.

[48]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[49]  Michael J. Flynn,et al.  Optimal Pipelining , 1990, J. Parallel Distributed Comput..