Architecture of the Atlas chip-multiprocessor: dynamically parallelizing irregular applications

An important research direction for future microprocessors is the single-chip multiprocessor. The drawbacks of this approach are that many important applications cannot be automatically parallelized and that performance suffers with "dusty-deck" binaries. This paper details a single-chip multiprocessor that engages a combination of aggressive speculation techniques to enable the dynamic parallelization of irregular, sequential binaries. Thread speculation (multiscalar execution) and data value prediction are combined to enable the processor to execute dependent threads in parallel. The architecture performs a novel form of dynamic thread partitioning called MEM-slicing, and includes an extremely aggressive correlated value predictor. Several new microarchitectural structures to manage inter-thread dependencies are described. Simulations show that sequential programs are amenable to this form of execution. Over SPECint95, an average speedup of 3.4 is achieved on 8 processors due entirely to the exploitation of thread level parallelism.

[1]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[2]  S. McFarling Combining Branch Predictors , 1993 .

[3]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[4]  Kunle Olukotun,et al.  Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor , 1997 .

[5]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[6]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[7]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[8]  Vivek Sarkar,et al.  Partitioning parallel programs for macro-dataflow , 1986, LFP '86.

[9]  Gurindar S. Sohi,et al.  Compiling for the multiscalar architecture , 1998 .

[10]  Jenn-Yuan Tsai,et al.  The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[11]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Antonio Gonzalez,et al.  Control and Data Dependence Speculation in Multithreaded Processors , 1998, HPCA 1998.

[13]  Vojin G. Oklobdzija,et al.  Multithreaded Decoupled Architecture , 1995, Int. J. High Speed Comput..

[14]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[15]  Andrew Wolfe,et al.  A variable instruction stream extension to the VLIW architecture , 1991, ASPLOS IV.

[16]  James D. Meindl Gigascale integration: is the sky the limit? , 1996 .

[17]  Anne Rogers,et al.  The performance impact of incomplete bypassing in processor pipelines , 1995, MICRO 1995.

[18]  Thomas M. Conte,et al.  Value speculation scheduling for high performance processors , 1998, ASPLOS VIII.

[19]  Ravi Nair Dynamic path-based branch correlation , 1995, MICRO 1995.

[20]  J. E. Thornton Design of a Computer: The Control Data 6600 , 1970 .

[21]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[22]  Todd C. Mowry,et al.  The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[23]  Kevin O'Brien,et al.  Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading , 1995, PACT.

[24]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[25]  James E. Smith,et al.  Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[26]  James D. Meindl,et al.  Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications , 2001, IEEE Trans. Computers.

[27]  D. Scott Wills,et al.  Profiling for input predictable threads , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[28]  D. Scott Wills,et al.  On Dynamic Speculative Thread Partitioning and the MEM-Slicing Algorithm , 2000, J. Univers. Comput. Sci..

[29]  Tarek M. Taha,et al.  Exploring Microprocessor Architectures for Gigascale Integration , 1999, ARVLSI.

[30]  D. Scott Wills,et al.  On dynamic speculative thread partitioning and the MEM-slicing algorithm , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[31]  Ravi Nair,et al.  Dynamic path-based branch correlation , 1995, MICRO 28.

[32]  José González,et al.  Speculative execution via address prediction and data prefetching , 1997, ICS '97.

[33]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[34]  G.S. Sohi,et al.  Dynamic Speculation And Synchronization Of Data Dependence , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[35]  Yale N. Patt,et al.  Alternative Implementations of Two-Level Adaptive Branch Prediction , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[36]  Manoj Franklin,et al.  The multiscalar architecture , 1993 .

[37]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[38]  Joseph T. Rahmeh,et al.  Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.