How VLIW almost disappeared - and then proliferated

Very long instruction word (VLIW) refers to a computer architecture and algorithms that take advantage of large amounts of instruction level parallelism (ILP). Joseph A. (Josh) Fisher, a former Yale professor and a Hewlett-Packard Senior Fellow, introduced VLIW architecture in the early 1980s. The insights underlying Fisher's invention of VLIW came to him when he was a graduate student at New York University's Courant Institute in the late 1970s. He was microcoding a clone of the Control Data Corporation (CDC) 6600 computer. To maximize the performance of the clone, called PUMA, he used a standard trick of writing microcode with many concurrent operations. In doing so, he realized he could get even more concurrency and performance by moving operations speculatively above branches. His study of this motion, and how it violated the laws of computer architecture of the day, led to his invention of the trace scheduling compiler algorithm. He wrote his Ph.D. thesis about trace scheduling. He later became a professor at Yale University, where he started the Extremely Long Instruction (ELI) project, which developed the architecture for the trace scheduling compiler.

[1]  R. Allmon,et al.  A 300 MHz 64 b quad-issue CMOS RISC microprocessor , 1995, Proceedings ISSCC '95 - International Solid-State Circuits Conference.

[2]  Guang R. Gao,et al.  Software pipelining showdown: optimal vs. heuristic methods in a production compiler , 1996, PLDI '96.

[3]  Toshiaki Kitamura,et al.  Scalar processor of the VPP500 parallel supercomputer , 1995, ICS '95.

[4]  Toshiyuki Nakata,et al.  A user-microprogrammable, local host computer with low-level parallelism , 1983, ISCA '83.

[5]  P. Faraboschi,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[6]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[7]  Andrew A. Chien,et al.  The Message Driven Processor: an integrated multicomputer processing element , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.

[8]  Norman P. Jouppi The future evolution of high-performance microprocessors , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[9]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[10]  William J. Dally,et al.  A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing , 2007, IEEE Journal of Solid-State Circuits.

[11]  Mauro Olivieri,et al.  Software optimization of the JPEG2000 algorithm on a VLIW CPU core for system-on-chip implementation , 2005, Circuits, Signals, and Systems.

[12]  Ray Simar Codevelopment of the TMS320C6X VelociTI architecture and compiler , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Woody Lichtenstein,et al.  The multiflow trace scheduling compiler , 1993, The Journal of Supercomputing.

[14]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Tracy Kidder,et al.  Soul of a New Machine , 1981 .

[16]  Cédric Bastoul,et al.  Productivity via Automatic Code Generation for PGAS Platforms with the R-Stream Compiler , 2009 .

[17]  Masayuki Ikeda,et al.  Architecture of the VPP500 parallel supercomputer , 1994, Proceedings of Supercomputing '94.

[18]  Robert P. Colwell,et al.  Architecture and implementation of a VLIW supercomputer , 1990, Proceedings SUPERCOMPUTING '90.

[19]  A. Suga,et al.  A 51.2 GOPS 1.0 GB/s-DMA single-chip multi-processor integrating quadruple 8-way VLIW processors , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[20]  Peter Yan-Tek Hsu Designing the TFP microprocessor , 1994, IEEE Micro.

[21]  Tsung-Han Tsai,et al.  DSP platform-based JPEG2000 encoder with fast EBCOT algorithm , 2004, IS&T/SPIE Electronic Imaging.

[22]  Hiroshi Hagiwara,et al.  Hardware Organization of a Low Level Parallel Processor , 1977, IFIP Congress.

[23]  Burton M. Leary,et al.  A 200 MHz 64 b dual-issue CMOS microprocessor , 1992, 1992 IEEE International Solid-State Circuits Conference Digest of Technical Papers.