An analysis of dynamic scheduling techniques for symbolic applications

Instruction-level parallelism in a single stream of code for non-numerical applications has been the subject of many recent researches. This work extends the analysis to symbolic applications described with logic programming. In particular, the authors analyze the effects on performance of speculative execution, memory alias disambiguation, renaming and flow prediction. The obtained results indicate that one can reach a sustained parallelism of 4 (comparable with imperative languages), with the proper optimizations. The authors also show a comparison between static and dynamic scheduled approaches, outlining the conditions under which a dynamic solution can reach substantial improvements over a static one. In this way, they point out some important optimizations and parameters of a dynamic scheduling approach, indicating a guideline for future architectural implementations. >

[1]  Michael Shebanow,et al.  Single instruction stream parallelism is greater than two , 1991, ISCA '91.

[2]  Alexandru Nicolau,et al.  Measuring the Parallelism Available for Very Long Instruction Word Architectures , 1984, IEEE Transactions on Computers.

[3]  Michael Allen,et al.  Organization of the Motorola 88110 superscalar RISC microprocessor , 1992, IEEE Micro.

[4]  B. Ramakrishna Rau,et al.  Efficient code generation for horizontal architectures: Compiler techniques and architectural support , 1982, ISCA '82.

[5]  Alvin M. Despain,et al.  Fast Prolog with an extended general purpose architecture , 1990, ISCA '90.

[6]  Andrew R. Pleszkun,et al.  Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.

[7]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[8]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[9]  Burton J. Smith,et al.  A processor architecture for Horizon , 1988, Proceedings. SUPERCOMPUTING '88.

[10]  Alexandru Nicolau,et al.  Parallelizing Programs with Recursive Data Structures , 1989, IEEE Trans. Parallel Distributed Syst..

[11]  J. Yetter,et al.  A high speed superscalar PA-RISC processor , 1992, Digest of Papers COMPCON Spring 1992.

[12]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[13]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[14]  Peter Van Roy,et al.  Can Logic Programming Execute as Fast as Imperative Programming? , 1990 .

[15]  Yale N. Patt,et al.  Checkpoint Repair for High-Performance Out-of-Order Execution Machines , 1987, IEEE Transactions on Computers.

[16]  Paolo Faraboschi,et al.  Instruction-level parallelism in Prolog: analysis and architectural support , 1992, ISCA '92.

[17]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[18]  Alan E. Charlesworth,et al.  An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family , 1981, Computer.

[19]  Hwa C. Torng,et al.  An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors , 1986, IEEE Transactions on Computers.

[20]  J. E. Thornton,et al.  Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[21]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[22]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[23]  R. D. Groves,et al.  An IBM second generation RISC processor architecture , 1989, Proceedings 1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors.