论文信息 - How Many Threads to Spawn during Program Multithreading?

How Many Threads to Spawn during Program Multithreading?

Thread-level program parallelization is key for exploiting the hardware parallelism of the emerging multi-core systems. Several techniques have been proposed for program multithreading. However, the existing techniques do not address the following key issues associated with multithread execution of a given program: (a) Whether multithreaded execution is faster than sequential execution; (b) How many threads to spawn during program multithreading. In this paper, we address the above limitations. Specifically, we propose a novel approach - T-OPT- to determine how many threads to spawn during multithreaded execution of a given program region. The latter helps to check under-subscribing and oversubscribing of the hardware resources. This in turn facilitates exploitation on higher level of thread-level parallelism (TLP) than what can be achieved using the state-of-the-art. We show that, from program dependence standpoint, use of larger number of threads than advocated by the proposed approach does not yield higher degree of TLP. We present a couple of case studies and results using kernels, extracted from open source codes, to demonstrate the efficacy of our techniques on a real machine.

Alexandru Nicolau | Arun Kejariwal

[1] Alexander V. Veidenbaum,et al. Synchronization optimizations for efficient execution on multi-cores , 2009, ICS '09.

[2] Thomas E. Anderson,et al. The performance implications of thread management alternatives for shared-memory multiprocessors , 1989, SIGMETRICS '89.

[3] Girija J. Narlikar,et al. Scheduling threads for low space requirement and good locality , 1999, SPAA '99.

[4] Ronald Gary Cytron. Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing) , 1984 .

[5] Peng Wu,et al. Compiler-Driven Dependence Profiling to Guide Program Parallelization , 2008, LCPC.

[6] Shahid H. Bokhari,et al. On the Mapping Problem , 1981, IEEE Transactions on Computers.

[7] Boris Weissman,et al. Performance counters and state sharing annotations: a unified approach to thread locality , 1998, ASPLOS VIII.

[8] B. Ramakrishna Rau,et al. Instruction-level parallel processing: History, overview, and perspective , 2005, The Journal of Supercomputing.

[9] Camille C. Price. Task allocation in distributed systems: A survey of practical strategies , 1982, ACM '82.

[10] Toshio Nakatani,et al. Making Compaction-Based Parallelization Affordable , 1993, IEEE Trans. Parallel Distributed Syst..

[11] Utpal K. Banerjee. Dependence Analysis , 2011, Encyclopedia of Parallel Computing.