Design and implementation of scalable, transparent threads for multi-core media processor

In this paper, we propose a scalable and transparent parallelization scheme using threads for multi-core processor. The performance achieved by our scheme is scalable to the number of cores, and the application program is not affected by the actual number of cores. For the performance efficiency, we designed the threads so that they do not suspend and that they do not start their execution until the data necessary for them are available. We implemented our design using three modules: the dependency controller, which controls dependencies among threads, the thread pool, which manages the ready threads, and the thread dispatcher, which fetches threads from the pool and executes them on the core. Our design and implementation provide efficient thread scheduling with low overhead. Moreover, by hiding the actual number of cores, it realizes transparency. We confirmed the transparency and scalability of our scheme by applying it to the H.264 decoder program. With this scheme, modification of application program is not necessary even if the number of cores changes due to disparate requirements. This feature makes the developing time shorter and contributes to the reduction of the developing cost.