Parallel Programming Models for Thousand-Core Microprocessors

This paper argues for an implicitly parallel programming model for many-core microprocessors, and provides initial technical approaches towards this goal. In an implicitly parallel programming model, programmers maximize algorithmlevel parallelism, express their parallel algorithms by asserting high-level properties on top of a traditional sequential programming language, and rely on parallelizing compilers and hardware support to perform parallel execution under the hood. In such a model, compilers and related tools require much more advanced program analysis capabilities and programmer assertions than what are currently available so that a comprehensive understanding of the input program’s concurrency can be derived. Such an understanding is then used to drive automatic or interactive parallel code generation tools for a diverse set of parallel hardware organizations. The chip-level architecture and hardware should maintain parallel execution state in such a way that a strictly sequential execution state can always be derived for the purpose of verifying and debugging the program. We argue that implicitly parallel programming models are critical for addressing the software development crises and software scalability challenges for many-core microprocessors.

[1]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[2]  K. Rustan M. Leino,et al.  The Spec# Programming System: An Overview , 2004, CASSIS.

[3]  John Paul Shen,et al.  Multiple Instruction Stream Processor , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[4]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[5]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[6]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[7]  Cameron McNairy,et al.  Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.

[8]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[9]  N. Mitchell Philips TriMedia: a digital media convergence platform , 1997, WESCON/97 Conference Proceedings.

[10]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[11]  P. Altena,et al.  In search of clusters , 2007 .

[12]  Kurt Keutzer,et al.  NP-Click: a productive software development approach for network processors , 2004, IEEE Micro.

[13]  Scott A. Mahlke,et al.  Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[14]  Wen-mei W. Hwu,et al.  Automatic Discovery of Coarse-Grained Parallelism in Media Applications , 2007, Trans. High Perform. Embed. Archit. Compil..

[15]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .