A Vectorizing Compiler for Multimedia Extensions

In this paper, we present an implementation of a vectorizing C compiler for Intel's MMX (Multimedia Extension). This compiler would identify data parallel sections of the code using scalar and array dependence analysis. To enhance the scope for application of the subword semantics, our compiler performs several code transformations. These include strip mining, scalar expansion, grouping and reduction, and distribution. Thereafter inline assembly instructions corresponding to the data parallel sections are generated. We have used the Stanford University Intermediate Format (SUIF), a public domain compiler tool, for our implementation. We evaluated the performance of the code generated by our compiler for a number of benchmarks. Initial performance results reveal that our compiler generated code produces a reasonable performance improvement (speedup of 2 to 6.5) over the the code generated without the vectorizing transformations/inline assembly. In certain cases, the performance of the compiler generated code is within 85% of the hand-tuned code for MMX architecture.

[1]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[2]  Michael D. Smith,et al.  Geust Editorial: Media processing: a new design target , 1996, IEEE Micro.

[3]  Frédéric Vivien,et al.  On the Optimality of Allen and Kennedy's Algorithm for Parallelism Extraction in Nested Loops , 1996, Parallel Algorithms Appl..

[4]  Krste Asanovic,et al.  Torrent Architecture Manual , 1997 .

[5]  Marc Tremblay,et al.  VIS speeds new media processing , 1996, IEEE Micro.

[6]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[7]  Ron Cytron,et al.  What's In a Name? -or- The Value of Renaming for Parallelism Detection and Storage Allocation , 1987, ICPP.

[8]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[9]  Aart J. C. Bik,et al.  Incorporating Intel MMX technology into a Java JIT compiler , 1999, Scientific Programming.

[10]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[11]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[12]  Ken Kennedy,et al.  Loop distribution with arbitrary control flow , 1990, Proceedings SUPERCOMPUTING '90.

[13]  Yoichi Muraoka,et al.  On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.

[14]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[15]  Ron Cytron,et al.  Interprocedural dependence analysis and parallelization , 1986, SIGP.

[16]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[17]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[18]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[19]  Ruby B. Lee,et al.  Challenges to Combining General-Purpose and Multimedia Processors , 1997, Computer.

[20]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[21]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[22]  Steven W. K. Tjiang,et al.  An overview of the suif compiler system , 1990 .

[23]  오승준 [서평]「Digital Video Processing」 , 1996 .

[24]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[25]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[26]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[27]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[28]  Cliff Young,et al.  The SUIF Control Flow Graph Library , 1998 .

[29]  A. Murat Tekalp,et al.  Digital Video Processing , 1995 .

[30]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.