COSPIM: a program optimization system for tightly-coupled heterogeneous environments

Processor-in-memory is a new class of computer architecture designed for reducing the performance gap between the processor and the memory. This architecture provides a tightly-coupled heterogeneous environment by integrating different processors in a system. An efficient parallelization and optimization mechanism is necessary for this system to transform the existed applications to achieve better performance. In this paper, we propose a comprehensive framework, COSPIM, based on the statement viewpoint in our early SAGE system. It integrates program decomposition, ETC (expected time to compute) evaluation and scheduling mechanisms together. We describe how COSPIM splits statements and produces schedule to execute on the host processor and the coprocessor simultaneously. The experimental results of this approach are also discussed.

[1]  M. Oskin,et al.  Active Pages: a computation model for intelligent memory , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[2]  Slo-Li Chu,et al.  PSS: a novel statement scheduling mechanism for a high-performance SoC architecture , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[3]  Debra A. Hensgen,et al.  The relative performance of various mapping algorithms is independent of sizable variances in run-time predictions , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[4]  Ken Kennedy,et al.  Automatic decomposition of scientific programs for parallel execution , 1987, POPL '87.

[5]  William H. Press,et al.  Numerical Recipes in Fortran 77 , 1992 .

[6]  Lee C. Potter,et al.  Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[7]  William H. Press,et al.  Numerical Recipes: FORTRAN , 1988 .

[8]  Ken Kennedy,et al.  Loop distribution with arbitrary control flow , 1990, Proceedings SUPERCOMPUTING '90.

[9]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[10]  Ladislau Bölöni,et al.  A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[11]  R. F. Freund,et al.  Optimal selection theory for superconcurrency , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[12]  Seung-Moon Yoo,et al.  FlexRAM: toward an advanced intelligent memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[13]  Ravishankar K. Iyer,et al.  Predictability of Process Resource Usage: A Measurement-Based Study on UNIX , 1989, IEEE Trans. Software Eng..

[14]  David K. Gifford,et al.  Static dependent costs for estimating execution time , 1994, LFP '94.

[15]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[16]  David J. Kuck,et al.  A Survey of Parallel Machine Organization and Programming , 1977, CSUR.

[17]  Jaewook Shin,et al.  Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[18]  Tsung-Chuan Huang,et al.  A statement based parallelizing framework for processor-in-memory architectures , 2003, Inf. Process. Lett..