A Software Framework for Supporting General Purpose Applications on Raw Computation Fabrics

This paper presents SUDS (Software Un-Do System), a data speculation system for Raw processors. SUDS manages speculation in software. The key to managing speculation in software is to use the compiler to minimize the number of data items that need to be managed at runtime. Managing speculation in software enables Raw processors to achieve good performance on integer applications without sacrificing chip area for speculation hardware. This additional area can instead be devoted to additional compute resources, improving the performance of dense matrix and media applications.

[1]  Thomas F. Knight An architecture for mostly functional languages , 1986, LFP '86.

[2]  SarkarVivek,et al.  Space-time scheduling of instruction-level parallelism on a raw machine , 1998 .

[3]  Gurindar S. Sohi,et al.  The Expandable Split Window Paradigm for Exploiting Fine-grain Parallelism , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[4]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[5]  James K. Archibald,et al.  An economical solution to the cache coherence problem , 1984, ISCA '84.

[6]  Todd C. Mowry,et al.  The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[7]  Brian N. Bershad,et al.  Extensibility safety and performance in the SPIN operating system , 1995, SOSP.

[8]  Keith D. Cooper,et al.  Compiler-controlled memory , 1998, ASPLOS VIII.

[9]  Rajiv Gupta,et al.  Load-reuse analysis: design and evaluation , 1999, PLDI '99.

[10]  David J. Lilja,et al.  Coarse-grained speculative execution in shared-memory multiprocessors , 1998, ICS '98.

[11]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[12]  Josep Torrellas,et al.  Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[13]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[14]  Andreas Moshovos,et al.  Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[15]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[16]  Alexandru Nicolau,et al.  Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies , 1989, IEEE Trans. Computers.

[17]  David R. Cheriton,et al.  Application-controlled physical memory using external page-cache management , 1992, ASPLOS V.

[18]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[19]  Pete Tinker,et al.  Parallel execution of sequential scheme with ParaTran , 1988, LISP and Functional Programming.

[20]  Rajeev Barua,et al.  Maps: a compiler-managed memory system for raw machines , 1999, ISCA.

[21]  Kourosh Gharachorloo,et al.  Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[22]  Jason Miller,et al.  The Raw Processor: A Composeable 32-Bit Fabric for Embedded and General Purpose Computing , 2001 .

[23]  Manoj Franklin Multi-Version Caches for Multiscalar Processors , 1995 .

[24]  Jerome H. Saltzer,et al.  End-to-end arguments in system design , 1984, TOCS.

[25]  David P. Reed,et al.  Implementing atomic actions on decentralized data , 1983, TOCS.

[26]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[27]  Philip A. Bernstein,et al.  Timestamp-Based Algorithms for Concurrency Control in Distributed Database Systems , 1980, VLDB.

[28]  Scott A. Mahlke,et al.  Integrated predicated and speculative execution in the IMPACT EPIC architecture , 1998, ISCA.

[29]  Anant Agarwal,et al.  Logic emulation with virtual wires , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[30]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[31]  Vivek Sarkar,et al.  Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.

[32]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[33]  Anoop Gupta,et al.  The Stanford FLASH multiprocessor , 1994, ISCA '94.

[34]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[35]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[36]  Raymond Lo,et al.  Register promotion by sparse partial redundancy elimination of loads and stores , 1998, PLDI.

[37]  Josep Torrellas,et al.  Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor , 1998, ICS '98.

[38]  Csaba Andras Moritz,et al.  FlexCache: A Framework for Flexible Compiler Generated Data Caching , 2000, Intelligent Memory Systems.

[39]  Robert Grimm,et al.  Application performance and flexibility on exokernel systems , 1997, SOSP.

[40]  John Paul Shen,et al.  Speculative disambiguation: a compilation technique for dynamic memory disambiguation , 1994, ISCA '94.

[41]  John R. Ellis,et al.  Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific) , 1985 .

[42]  Anant Agarwal,et al.  Software-extended coherent shared memory: performance and cost , 1994, ISCA '94.

[43]  Jason Eric Miller Software Based Instruction Caching for the RAW Architecture , 1999 .

[44]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[45]  Keith D. Cooper,et al.  Register promotion in C programs , 1997, PLDI '97.

[46]  E AndersonThomas,et al.  Efficient software-based fault isolation , 1993 .

[47]  Gurindar S. Sohi,et al.  Compiling for the multiscalar architecture , 1998 .

[48]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[49]  Scott A. Mahlke,et al.  Dynamic memory disambiguation using the memory conflict buffer , 1994, ASPLOS VI.

[50]  Mark Horowitz,et al.  An evaluation of directory schemes for cache coherence , 1998, ISCA '98.

[51]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[52]  Joel H. Saltz,et al.  Run-time and compile-time support for adaptive irregular problems , 1994, Proceedings of Supercomputing '94.

[53]  Gary S. Tyson,et al.  Improving the accuracy and performance of memory communication through renaming , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[54]  William J. Dally Micro-optimization of floating-point operations , 1989, ASPLOS III.

[55]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[56]  Joel S. Emer,et al.  Memory dependence prediction using store sets , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).