Context adaptive rate-distortion optimization in video coding
暂无分享,去创建一个
In this thesis, we present two context adaptive methods for rate-distortion optimization in video coding. The first method dynamically adjusts the Lagrange multipliers optimizing rate-distortion cost for each macroblock (16x16 pixel block), based on the context of the neighboring or upper coding layer blocks. Our method improves the accuracy for the detection of true motion vectors as well as finds the most efficient encoding modes for coding the luminance component; which are in turn used for deriving the motion vectors and modes for coding the chrominance components. Simulation results for H.264/AVC video demonstrate that the proposed method reduces bit rate significantly and achieves a peak-signal-to-noise (PSNR) gain over those of the H.264/AVC Joint Model (JM) reference software for all sequences tested, with negligible extra computational cost. The average bit-rate reduction is 2.67% and the average PSNR gain is 0.151 dB. The improvement is particularly significant for high motion high-resolution video. This work also led to our Joint Video Team (JVT) adopted contribution (included in JM software version 12.0 onwards), collectively known as Context Adaptive Lagrange Multiplier (CALM).
Following the CALM method, we also propose an entropy coding method for macroblock modes based on the statistical information from the previous frame. H.264/AVC video coding introduces new macroblock sub-types (modes) that allow smaller partitions of macroblock for better rate-distortion performance. These modes are coded with a fixed mapping table when context adaptive variable length coding (CAVLC) is used. Based on our experiments on the sequences recommended by the Video Coding Experts Group (VCEG) common test conditions [54], the average number of bits to code those modes is about 15% of that of the total bits for coding P-frames (predictive-frames). Our proposed method maps the more frequently used modes to shorter codewords instead of using a fixed mapping table, and hence reduces about 10-20% of the bits used for coding the inter modes. Our experimental results comparing with that of the Key Technology Area (KTA) reference software on the common sequences demonstrated an average bit-rate saving by more than 1% when the group of frames (GOP) structure is IPPP... ("I" refers to intra-frame and "P" refers to predictive-frame). The average saving is more than 2.5% when the GOP structure is IBBP... ("B" refers to bi-directional predictive frame).