PSS: a novel statement scheduling mechanism for a high-performance SoC architecture

Continuous improvements in semiconductor fabrication density are supporting new classes of system-on-a-chip (SoC) architectures that combine extensive processing logic/processor with high-density memory. Such architectures are generally called processor-in-memory (PIM) or intelligent memory (I-RAM) and can support high-performance computing by reducing the performance gap between the processor and the memory. The PIM architecture combines various processors in a single system. These processors are characterized by their computation and memory-access capabilities. Therefore, a strategy must be developed to identify their capabilities and dispatch the most appropriate jobs to them in order to exploit them fully. Accordingly, this study presents a new automatic source-to-source parallelizing system, called SAGE, to exploit the advantages of PIM architectures. Unlike conventional iteration-based parallelizing systems, SAGE adopts statement-based analyzing approaches. It adopts a new pair-selection scheduling (PSS) mechanism to achieve better utilization and workload balance between the host and memory processors of PIM architectures. This paper also provides performance results and comparison of several benchmarks to demonstrate the capability of this new scheduling algorithm.

[1]  Richard Crisp,et al.  Direct RAMbus technology: the new main memory standard , 1997, IEEE Micro.

[2]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[3]  David L. Landis,et al.  Evaluation of computing in memory architectures for digital image processing applications , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[4]  Tsung-Chuan Huang,et al.  Improving workload balance and code optimization in processor-in-memory systems , 2001, Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001.

[5]  Seung-Moon Yoo,et al.  FlexRAM: toward an advanced intelligent memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[6]  Ko-Yang Wang Precise compile-time performance prediction for superscalar-based computers , 1994, PLDI '94.

[7]  Lawrence Rauchwerger,et al.  Effective Automatic Parallelization with Polaris , 1995 .

[8]  M. Oskin,et al.  Active Pages: a computation model for intelligent memory , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[9]  Martin Margala,et al.  Using computational RAM for volume rendering , 2000, Proceedings of 13th Annual IEEE International ASIC/SOC Conference (Cat. No.00TH8541).

[10]  Christoforos E. Kozyrakis,et al.  Exploiting On-Chip Memory Bandwidth in the VIRAM Compiler , 2000, Intelligent Memory Systems.

[11]  Jaewook Shin,et al.  Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[12]  Tsung-Chuan Huang,et al.  SAGE: A New Analysis and Optimization System for FlexRAM Architecture , 2000, Intelligent Memory Systems.

[13]  Tsung-Chuan Huang,et al.  SAGE: an automatic analyzing system for a new high-performance SoC architecture--processor-in-memory , 2004, J. Syst. Archit..

[14]  William H. Press,et al.  Numerical Recipes in Fortran 77 , 1992 .

[15]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[16]  William H. Press,et al.  Numerical Recipes: FORTRAN , 1988 .