We present an efficient implementation of motion estimation (ME) for H.264/AVC using programmable graphics hardware. The cost function for ME in H.264/AVC depends on the motion vector (MV) predictor which is the median MV of three neighboring coded blocks. Previous implementations assume no dependency among adjacent blocks, which is not true for H.264/AVC, they also perform unsatisfactorily because of their low arithmetic intensity, which is defined as operation per word transferred. To overcome the dependency problem, we introduce a new implementation which performs ME on block-by-block basis. Moreover, we can adjust the arithmetic intensity easily to optimize the performance on different graphics cards. Experimental results show that our implementation is substantially faster (by 10 times) than our SIMD optimized CPU implementation
[1]
Harry Shum,et al.
Accelerate Video Decoding With Generic GPU
,
2005,
IEEE Trans. Circuits Syst. Video Technol..
[2]
Huifang Chen,et al.
Techniques for efficient DCT/IDCT implementation on generic GPU
,
2005,
2005 IEEE International Symposium on Circuits and Systems.
[3]
Matt Pharr,et al.
Gpu gems 2: programming techniques for high-performance graphics and general-purpose computation
,
2005
.
[4]
Thomas Wiegand,et al.
Draft ITU-T recommendation and final draft international standard of joint video specification
,
2003
.
[5]
Anil Kokaram,et al.
Fast image interpolation for motion estimation using graphics hardware
,
2004,
IS&T/SPIE Electronic Imaging.