Automatic Design of Computer Instruction Sets

This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small code segments, optimally recompiling these segments using exhaustive search, and finding the cover of the new instructions generated that optimizes the performance metric. The complete process is illustrated by generating an instruction set for a processor optimized for executing compiled Prolog programs. The generated instruction set is compared with the hand-designed VLSI-BAM instruction set. The automatically designed instruction set is smaller and has only a few percent less performance on the benchmark programs. This result is an improvement for a metric which includes both instruction set size and performance. Thus automatically derived instruction sets can be as good as or better than manually derived ones.

[1]  William J. Dally,et al.  The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.

[2]  Norman P. Jouppi,et al.  Design of a high performance VLSI processor , 1983 .

[3]  John W. Mauchly Preparation of Problems for EDVAC-Type Machines , 1982 .

[4]  Leonard Jay Shustek,et al.  Analysis and performance of computer instruction sets , 1978 .

[5]  Alfred V. Aho,et al.  Code generation using tree matching and dynamic programming , 1989, ACM Trans. Program. Lang. Syst..

[6]  David A. Patterson,et al.  A 32-bit microprocessor for Smalltalk , 1986 .

[7]  Peter M. Kogge,et al.  The Architecture of Pipelined Computers , 1981 .

[8]  Mike Johnson,et al.  Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[9]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[10]  Samuel H. Fuller,et al.  Evaluation of computer architectures via test programs , 1899, AFIPS '77.

[11]  Christopher W. Fraser,et al.  Automatic inference and fast interpretation of peephole optimization rules† , 1987, Softw. Pract. Exp..

[12]  Ralph Haygood A Prolog Benchmark Suite for Aquarius , 1989 .

[13]  Ruby B. Lee Precision architecture , 1989, Computer.

[14]  Michael J. Flynn Towards better instruction sets , 1983, SIGM.

[15]  Bruce K Holmer A Detailed Description of the VLSI-PLM Instruction Set: A WAM Based Processor for Prolog , 1989 .

[16]  Subrata Dasgupta,et al.  The Organization of Microprogram Stores , 1979, CSUR.

[17]  Peter Van Roy,et al.  Can Logic Programming Execute as Fast as Imperative Programming? , 1990 .

[18]  Alexandru Nicolau,et al.  Uniform Parallelism Exploitation in Ordinary Programs , 1985, ICPP.

[19]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[20]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[21]  Richard E. Sweet,et al.  Empirical analysis of the mesa instruction set , 1982, ASPLOS I.

[22]  Dennis Tsichritzis,et al.  The Equivalence Problem of Simple Programs , 1970, JACM.

[23]  J. P. Bennett A methodology for automated design of computer instruction sets , 1987 .

[24]  Pradip Bose,et al.  Instruction Set Design for Support of High-Level Languages , 1983 .

[25]  Samuel H. Fuller,et al.  Initial selection and screening of the CFA candidate computer architectures , 1977, AFIPS '77.

[26]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[27]  Herman H. Goldstine,et al.  Preliminary discussion of the logical design of an electronic computing instrument (1946) , 1989 .

[28]  David Bernstein,et al.  Scheduling expressions on a pipelined processor with a maximal delay of one cycle , 1989, TOPL.

[29]  Yale N. Patt,et al.  Performance studies of a Prolog machine architecture , 1985, ISCA '85.

[30]  Michael J. Flynn,et al.  Analyzing computer architectures , 1989 .

[31]  Michael Rodeh,et al.  Scheduling arithmetic and load operations in parallel with no spilling , 1987, POPL '87.

[32]  Harvey F. Silverman,et al.  Processor reconfiguration through instruction-set metamorphosis , 1993, Computer.

[33]  Gordon Bell,et al.  A new architecture for mini-computers: the DEC PDP-11 , 1970, AFIPS '70 (Spring).

[34]  Andries van Dam,et al.  Vertical Migration for Performance Enhancement in Layered Hardware/Firmware/Software Systems , 1978, Computer.

[35]  David R. Ditzel Reflections on the High-Level Language Symbol Computer System , 1981, Computer.

[36]  Donald E. Thomas,et al.  Automatic Data Path Synthesis , 1983, Computer.

[37]  William A. Wulf Compilers and Computer Architecture , 1981, Computer.

[38]  Åmund Lunde Empirical evaluation of some features of instruction set processor architectures , 1977, CACM.

[39]  George Radin,et al.  The 801 minicomputer , 1982, ASPLOS I.

[40]  Ifor Williams,et al.  The design and evaluation of a high-performance smalltalk system , 1988 .

[41]  Peter Van Roy,et al.  High-performance logic programming with the Aquarius Prolog compiler , 1992, Computer.

[42]  Alvin M. Despain,et al.  Fast Prolog with an extended general purpose architecture , 1990, ISCA '90.

[43]  Pradip Bose,et al.  Design of instruction set architectures for support of high-level languages , 1984, ISCA '84.

[44]  Abd-Elfattah Mohamed Abd-alla,et al.  Heuristic Synthesis of Microprogrammed Computer Architecture , 1974, IEEE Transactions on Computers.

[45]  Alfred V. Aho,et al.  Code Generation for Expressions with Common Subexpressions , 1977, J. ACM.

[46]  A. Dain Samples,et al.  Mache: no-loss trace compaction , 1989, SIGMETRICS '89.

[47]  Allen Newell,et al.  Computer Structures: Principles and Examples , 1983 .

[48]  Stephen C. Johnson A 32-bit processor design , 1979 .

[49]  Christopher W. Fraser,et al.  Code selection through object code optimization , 1984, TOPL.

[50]  Bruce D. Shriver,et al.  Local Microcode Compaction Techniques , 1980, CSUR.

[51]  Frederick M. Haney,et al.  ISDS: a program that designs computer instruction sets , 1969, AFIPS '69 (Fall).

[52]  Ken Thompson A New C Compiler , 1990 .

[53]  David K. Gifford,et al.  Case study: IBM's system/360-370 architecture , 1987, CACM.

[54]  Richard Kenner,et al.  Eliminating branches using a superoptimizer and the GNU C compiler , 1992, PLDI '92.

[55]  Vason P. Srini,et al.  CMOS CHIP FOR PROLOG. , 1987 .

[56]  Norman P. Jouppi,et al.  The MIPS Machine , 1982, COMPCON.

[57]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[58]  Alfred V. Aho,et al.  Optimal code generation for expression trees , 1975, STOC.

[59]  Alexandru Nicolau,et al.  Percolation Scheduling: A Parallel Compilation Technique , 1985 .

[60]  Eduardo Pelegrí-Llopart,et al.  Optimal code generation for expression trees: an application BURS theory , 1988, POPL '88.

[61]  Frederick Marion Haney Using a computer to design computer instruction sets , 1968 .

[62]  Peter Van Roy,et al.  An Intermediate Language to Support Prolog's Unification , 1989, NACLP.

[63]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[64]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[65]  Ashok K. Agrawala,et al.  Dynamic Problem-Oriented Redefinition of Computer Architecture via Microprogramming , 1978, IEEE Transactions on Computers.

[66]  Peter M. Kogge The microprogramming of pipelined processors , 1977, ISCA '77.

[67]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.