A Cache-Aware Strategy for H.264 Decoding on Multi-processor Architectures

H.264-AVC is one of the most popular formats for the recording, compression and distribution of video. Encoders and decoders for the H.264 standard are widely in demand, and efficient strategies for enhancing their performance have been areas of active research. With the proliferation of many-core architectures in the embedded community, there has been a trend towards parallelizing implementations of encoders and decoders. In this paper, we present a run time heuristic which exploits macro-block level parallelism and efficient scheduling inside a H.264 decoder to reduce the number of cache misses and improve the processor utilization. Experiments on standard benchmarks show a significant speed-up over contemporary strategies proposed in literature.