论文信息 - Exploiting fine-grain concurrency: Analytical insights in superscalar processor design

Exploiting fine-grain concurrency: Analytical insights in superscalar processor design

..... ................................... ................... ............................................. . xxi CHAPTER

Pradeep Dubey | Michael J. Flynn | George B. Adams

[1] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[2] David J. Lilja,et al. Comparing Parallelism Extraction Techniques: Superscalar Processors, Pipelined Processors, and Multiprocessors , 1990, ICPP.

[3] Michael J. Flynn,et al. Representation of Concurrency with Ordering Matrices , 1973, IEEE Transactions on Computers.

[4] Yoichi Muraoka,et al. On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.

[5] Leonard W. Cotten. Circuit implementation of high-speed pipeline systems , 1965, AFIPS '65 (Fall, part I).

[6] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[7] Jack B. Dennis,et al. A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.

[8] Bradley Kevin Fawcett. Maximal Clocking Rates for Pipelined Digital Systems , 1975 .

[9] Alexandru Nicolau,et al. Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies , 1989, IEEE Trans. Computers.

[10] Garold Stephen Tjaden. Representation and detection of concurrency using ordering-matrices. , 1972 .

[11] Peter J. Denning,et al. Virtual memory , 1970, CSUR.

[12] Yale N. Patt,et al. HPS, a new microarchitecture: rationale and introduction , 1985, MICRO 18.

[13] Thomas R. Gross,et al. Optimizing delayed branches , 1982, MICRO 15.

[14] James E. Smith,et al. Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Transactions on Computers.

[15] Robert G. Wedig. Detection of concurrency in directly executed language instruction streams , 1982 .

[16] Arvind,et al. The U-Interpreter , 1982, Computer.

[17] Hwa C. Torng,et al. An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors , 1986, IEEE Transactions on Computers.

[18] H. T. Kung. Why systolic architectures? , 1982, Computer.

[19] Robert P. Colwell,et al. A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.

[20] Gurindar S. Sohi,et al. Tradeoffs in instruction format design for horizontal architectures , 1989, ASPLOS III.

[21] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[22] Michael J. Flynn,et al. Pipelining of Arithmetic Functions , 1972, IEEE Trans. Computers.

[23] R. Karp,et al. Properties of a model for parallel computations: determinacy , 1966 .

[24] J. E. Thornton. Design of a Computer: The Control Data 6600 , 1970 .

[25] Michael J. Flynn,et al. Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[26] Alexandru Nicolau,et al. Measuring the Parallelism Available for Very Long Instruction Word Architectures , 1984, IEEE Transactions on Computers.

[27] Jean-Loup Baer,et al. Legality and Other Properties of Graph Models of Computations , 1970, JACM.

[28] Janak H. Patel,et al. Improving the Throughput of a Pipeline by Insertion of Delays , 1976, ISCA.

[29] Andrew R. Pleszkun,et al. Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[30] Michael J. Flynn,et al. Detection and Parallel Execution of Independent Instructions , 1970, IEEE Transactions on Computers.

[31] George Cybenko,et al. Supercomputer performance evaluation and the Perfect Benchmarks , 1990, ICS '90.

[32] Monica S. Lam,et al. Architecture and Compiler Tradeoffs for a Long Instruction Word Microprocessor , 1989, ASPLOS.

[33] Peter Y.-T. Hsu,et al. Highly concurrent scalar processing , 1986, ISCA '86.

[34] Constantine Demetrios Polychronopoulos. On program restructuring, scheduling, and communication for parallel processor systems , 1986 .

[35] James E. Smith,et al. A study of scalar compilation techniques for pipelined supercomputers , 1987, ASPLOS.

[36] Edward M. Riseman,et al. Percolation of Code to Enhance Parallel Dispatching and Execution , 1972, IEEE Transactions on Computers.

[37] Gurindar S. Sohi,et al. Instruction issue logic for high-performance, interruptable pipelined processors , 1987, ISCA '87.

[38] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[39] B. Ramakrishna Rau,et al. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[40] James E. Smith,et al. Optimal Pipelining in Supercomputers , 1986, ISCA.

[41] Raymond E. Miller,et al. A Comparison of Some Theoretical Models of Parallel Computation , 1973, IEEE Transactions on Computers.

[42] Michael D. Smith,et al. Limits on multiple instruction issue , 1989, ASPLOS III.

[43] Norman P. Jouppi,et al. Available instruction-level parallelism for superscalar and superpipelined machines , 1989, ASPLOS III.

[44] Alexandru Nicolau,et al. Loop Quantization: A Generalized Loop Unwinding Technique , 1988, J. Parallel Distributed Comput..

[45] Henry M. Levy,et al. An evaluation of branch architectures , 1987, ISCA '87.

[46] Yale N. Patt,et al. Checkpoint Repair for High-Performance Out-of-Order Execution Machines , 1987, IEEE Transactions on Computers.

[47] Alexandru Nicolau,et al. Uniform Parallelism Exploitation in Ordinary Programs , 1985, ICPP.

[48] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[49] Michael J. Flynn,et al. Optimal Pipelining , 1990, J. Parallel Distributed Comput..