Run-time parallelization for partially parallel loops

In this paper, a run-time technique based on inspector-executor scheme is proposed to find available parallelism on loops in this paper. Our inspector can determine the wavefronts by building a DEF-USE table. Additionally, the process of inspector for finding the wavefronts, can be parallelized fully without any synchronization. Our executor can perform the loop iterations concurrently. For each wavefront in a loop, the auto-adapted function is used to get a tailored thread number rather than using fixed thread number for execution. Experimental results show that our new parallel inspector can handle complex data dependency patterns and reduce itself execution time obviously. Besides, the new partitioning strategy for executor can also improve the performance of run-time parallelization obviously.