Exploiting fine-grain concurrency: Analytical insights in superscalar processor design
暂无分享,去创建一个
[1] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.
[2] David J. Lilja,et al. Comparing Parallelism Extraction Techniques: Superscalar Processors, Pipelined Processors, and Multiprocessors , 1990, ICPP.
[3] Michael J. Flynn,et al. Representation of Concurrency with Ordering Matrices , 1973, IEEE Transactions on Computers.
[4] Yoichi Muraoka,et al. On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.
[5] Leonard W. Cotten. Circuit implementation of high-speed pipeline systems , 1965, AFIPS '65 (Fall, part I).
[6] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[7] Jack B. Dennis,et al. A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.
[8] Bradley Kevin Fawcett. Maximal Clocking Rates for Pipelined Digital Systems , 1975 .
[9] Alexandru Nicolau,et al. Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies , 1989, IEEE Trans. Computers.
[10] Garold Stephen Tjaden. Representation and detection of concurrency using ordering-matrices. , 1972 .
[11] Peter J. Denning,et al. Virtual memory , 1970, CSUR.
[12] Yale N. Patt,et al. HPS, a new microarchitecture: rationale and introduction , 1985, MICRO 18.
[13] Thomas R. Gross,et al. Optimizing delayed branches , 1982, MICRO 15.
[14] James E. Smith,et al. Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Transactions on Computers.
[15] Robert G. Wedig. Detection of concurrency in directly executed language instruction streams , 1982 .
[16] Arvind,et al. The U-Interpreter , 1982, Computer.
[17] Hwa C. Torng,et al. An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors , 1986, IEEE Transactions on Computers.
[18] H. T. Kung. Why systolic architectures? , 1982, Computer.
[19] Robert P. Colwell,et al. A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.
[20] Gurindar S. Sohi,et al. Tradeoffs in instruction format design for horizontal architectures , 1989, ASPLOS III.
[21] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[22] Michael J. Flynn,et al. Pipelining of Arithmetic Functions , 1972, IEEE Trans. Computers.
[23] R. Karp,et al. Properties of a model for parallel computations: determinacy , 1966 .
[24] J. E. Thornton. Design of a Computer: The Control Data 6600 , 1970 .
[25] Michael J. Flynn,et al. Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.
[26] Alexandru Nicolau,et al. Measuring the Parallelism Available for Very Long Instruction Word Architectures , 1984, IEEE Transactions on Computers.
[27] Jean-Loup Baer,et al. Legality and Other Properties of Graph Models of Computations , 1970, JACM.
[28] Janak H. Patel,et al. Improving the Throughput of a Pipeline by Insertion of Delays , 1976, ISCA.
[29] Andrew R. Pleszkun,et al. Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.
[30] Michael J. Flynn,et al. Detection and Parallel Execution of Independent Instructions , 1970, IEEE Transactions on Computers.
[31] George Cybenko,et al. Supercomputer performance evaluation and the Perfect Benchmarks , 1990, ICS '90.
[32] Monica S. Lam,et al. Architecture and Compiler Tradeoffs for a Long Instruction Word Microprocessor , 1989, ASPLOS.
[33] Peter Y.-T. Hsu,et al. Highly concurrent scalar processing , 1986, ISCA '86.
[34] Constantine Demetrios Polychronopoulos. On program restructuring, scheduling, and communication for parallel processor systems , 1986 .
[35] James E. Smith,et al. A study of scalar compilation techniques for pipelined supercomputers , 1987, ASPLOS.
[36] Edward M. Riseman,et al. Percolation of Code to Enhance Parallel Dispatching and Execution , 1972, IEEE Transactions on Computers.
[37] Gurindar S. Sohi,et al. Instruction issue logic for high-performance, interruptable pipelined processors , 1987, ISCA '87.
[38] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.
[39] B. Ramakrishna Rau,et al. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.
[40] James E. Smith,et al. Optimal Pipelining in Supercomputers , 1986, ISCA.
[41] Raymond E. Miller,et al. A Comparison of Some Theoretical Models of Parallel Computation , 1973, IEEE Transactions on Computers.
[42] Michael D. Smith,et al. Limits on multiple instruction issue , 1989, ASPLOS III.
[43] Norman P. Jouppi,et al. Available instruction-level parallelism for superscalar and superpipelined machines , 1989, ASPLOS III.
[44] Alexandru Nicolau,et al. Loop Quantization: A Generalized Loop Unwinding Technique , 1988, J. Parallel Distributed Comput..
[45] Henry M. Levy,et al. An evaluation of branch architectures , 1987, ISCA '87.
[46] Yale N. Patt,et al. Checkpoint Repair for High-Performance Out-of-Order Execution Machines , 1987, IEEE Transactions on Computers.
[47] Alexandru Nicolau,et al. Uniform Parallelism Exploitation in Ordinary Programs , 1985, ICPP.
[48] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[49] Michael J. Flynn,et al. Optimal Pipelining , 1990, J. Parallel Distributed Comput..