Parallelizing Compiler for Single and Multicore Computing

To overcome challenges from high power densities and thermal hot spots in microprocessors, multicore computing platforms have emerged as the ubiquitous computing platform from servers to embedded systems . But, providing multiple cores does not directly translate into increased performance for most applications. The burden is placed on software developers to find and exploit coarse-grain parallelism to effectively make use of the abundance of computing resources provided by the systems. With the rise of multicore systems and many-core processors, concurrency becomes a major issue in the daily life of a programmer. Thus, compiler and software development tools will be critical to help programmers create high-performance software. This chapter covers software issues of a so-called parallelizing queue compiler targeted for future single- and multicore embedded systems.

[1]  Tsutomu Yoshinaga,et al.  The QC-2 parallel Queue processor architecture , 2008, J. Parallel Distributed Comput..

[2]  Tsutomu Yoshinaga,et al.  High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core , 2006, The Journal of Supercomputing.

[3]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[4]  Josep Llosa,et al.  Quantitative Evaluation of Register Pressure on Software Pipelined Loops , 1998, International Journal of Parallel Programming.

[5]  Liam Goudge,et al.  Thumb: reducing the cost of 32-bit RISC performance in portable and consumer applications , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[6]  Tsutomu Yoshinaga,et al.  Parallel Queue Processor Architecture Based on Produced Order Computation Model , 2005, The Journal of Supercomputing.

[7]  Arquimedes Canedo,et al.  Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture , 2011, The Journal of Supercomputing.

[8]  Masahiro Sowa,et al.  Design of a superscalar processor based on queue machine computation model , 1999, 1999 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 1999). Conference Proceedings (Cat. No.99CH36368).

[9]  Javier Zalamea,et al.  Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures , 2004, International Journal of Parallel Programming.

[10]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[11]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[12]  Scott A. Mahlke,et al.  Partitioning variables across register windows to reduce spill code in a low-power processor , 2005, IEEE Transactions on Computers.

[13]  Herman Schmit,et al.  Queue machines: hardware compilation in hardware , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[14]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[15]  Philip H. Sweany,et al.  A Code Generation Framework for VLIW Architectures with Partitioned Register Banks , 2007 .

[16]  Arquimedes Canedo,et al.  A new code generation algorithm for 2-offset producer order queue computation model , 2008, Comput. Lang. Syst. Struct..

[17]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[18]  Lenwood S. Heath,et al.  Stack and Queue Layouts of Directed Acyclic Graphs: Part I , 1999, SIAM J. Comput..

[19]  Xue Ming Henry Huang High-level loop transformations for architectures with partitioned register banks , 2000 .

[20]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .

[21]  Henk Corporaal,et al.  Partitioned register file for TTAs , 1995, MICRO 1995.

[22]  Masahiro Sowa,et al.  Design and architecture for an embedded 32-bit QueueCore , 2006, J. Embed. Comput..

[23]  Edward S. Davidson,et al.  Evaluating the Use of Register Queues in Software Pipelined Loops , 2001, IEEE Trans. Computers.

[24]  Arquimedes Canedo,et al.  Efficient compilation for queue size constrained queue processors , 2009, Parallel Comput..

[25]  Gürhan Küçük,et al.  Energy Efficient Register Renaming , 2003, PATMOS.

[26]  Corporate SPARC architecture manual - version 8 , 1992 .

[27]  Shlomit S. Pinter,et al.  Register allocation with instruction scheduling , 1993, PLDI '93.

[28]  Arquimedes Canedo,et al.  Compiling for Reduced Bit-Width Queue Processors , 2010, J. Signal Process. Syst..

[29]  Bruno R. Preiss,et al.  Data flow on a queue machine , 1985, ISCA 1985.