A scalable, clustered SMT processor for digital signal processing

A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme [1], that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file or the instruction scheduler and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.

[1]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[2]  Chris Wilkerson,et al.  Hierarchical scheduling windows , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[3]  Gurindar S. Sohi,et al.  Instruction Issue Logic for High-Performance Interruptible, Multiple Functional Unit, Pipelines Computers , 1990, IEEE Trans. Computers.

[4]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5]  Peter Pirsch,et al.  Multicore system-on-chip architecture for MPEG-4 streaming video , 2002, IEEE Trans. Circuits Syst. Video Technol..

[6]  Kozo Kimura,et al.  An elementary processor architecture with simultaneous instruction issuing from multiple threads , 1992, ISCA '92.

[7]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[8]  Viresh Rustagi,et al.  Calisto: A Low-Power Single-Chip Multiprocessor Communications Platform , 2003, IEEE Micro.

[9]  James E. Smith,et al.  Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Trans. Computers.

[10]  Wolfram Sauer,et al.  A 1.8-GHz instruction window buffer for an out-of-order microprocessor core , 2001 .

[11]  Geoffrey Brown,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, ISCA '00.

[12]  J. Tschanz,et al.  A 25 GHz 32 b integer-execution core in 130 nm dual-V/sub T/ CMOS , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[13]  Nobu Matsumoto,et al.  A single-chip MPEG-2 codec based on customizable media embedded processor , 2003 .

[14]  David A. Koufaty,et al.  Hyperthreading Technology in the Netburst Microarchitecture , 2003, IEEE Micro.

[15]  Yervant Zorian,et al.  2001 Technology Roadmap for Semiconductors , 2002, Computer.

[16]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[17]  Peter Pirsch,et al.  An Algorithm-Hardware-System Approach to VLIW Multimedia Processors , 1998, J. VLSI Signal Process..

[18]  Rajeev Balasubramonian,et al.  Reducing the complexity of the register file in dynamic superscalar processors , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[19]  Ruby B. Lee Accelerating multimedia with enhanced microprocessors , 1995, IEEE Micro.

[20]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[21]  Hans-Joachim Stolberg,et al.  Code positioning to reduce instruction cache misses in signal processing applications on multimedia RISC processors , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Antonio González,et al.  Reducing wire delay penalty through value prediction , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[23]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[24]  Henk Corporaal Microprocessor architectures - from VLIW to TTA , 1997 .

[25]  Henry Hoffmann,et al.  The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.

[26]  Minerva M. Yeung,et al.  The impact of SMT/SMP designs on multimedia software engineering - a workload analysis study , 2002, Fourth International Symposium on Multimedia Software Engineering, 2002. Proceedings..

[27]  Cameron McNairy,et al.  Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.

[28]  Santanu Dutta,et al.  Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems , 2001, IEEE Des. Test Comput..

[29]  J.L. van Meerbergen,et al.  Heterogeneous multiprocessor for the management of real-time video and graphics streams , 2000, IEEE Journal of Solid-State Circuits.

[30]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[31]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[32]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[33]  Mike Johnson,et al.  Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[34]  Peter Pirsch,et al.  A platform-independent methodology for performance estimation of streaming media applications , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[35]  Frank Vahid,et al.  The Softening of Hardware , 2003, Computer.

[36]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[37]  Peter Pirsch,et al.  Instruction Set Extensions for MPEG-4 Video , 1999, J. VLSI Signal Process..

[38]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[39]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[40]  S. Peter Song,et al.  The PowerPC 604 RISC microprocessor. , 1994, IEEE Micro.

[41]  Manoj Franklin,et al.  PEWs: a decentralized dynamic scheduler for ILP processing , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[42]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[43]  William J. Dally,et al.  Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[44]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[45]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[46]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[47]  Theo Ungerer,et al.  MPEG-2 video decompression on simultaneous multithreaded multimedia processors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[48]  H. Zhang,et al.  A 1 V heterogeneous reconfigurable processor IC for baseband wireless applications , 2000, 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056).

[49]  Mircea R. Stan,et al.  5-GHz 32-bit Integer Execution Core in 130-nm Dual-VT CMOS , 2001 .

[50]  R. Nagarajan,et al.  A design space evaluation of grid processor architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[51]  Yale N. Patt,et al.  Select-free instruction scheduling logic , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[52]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.