Reinforcement learning for video encoder control in HEVC

In todays video compression systems, the encoder typically follows an optimization procedure to find a compressed representation of the video signal. While primary optimization criteria are bit rate and image distortion, low complexity of this procedure may also be of importance in some applications, making complexity a third objective. We approach this problem by treating the encoding procedure as a decision process in time and make it amenable to reinforcement learning. Our learning algorithm computes a strategy in a compact functional representation, which is then employed in the video encoder to control its search. By including measured execution time into the reinforcement signal with a lagrangian weight, we realize a trade-off between RD-performance and computational complexity controlled by a single parameter. Using the reference software test model (HM) of the HEVC video coding standard, we show that over half the encoding time can be saved at the same RD-performance.

[1]  André Kaup,et al.  Fast CU split decisions for HEVC inter coding using support vector machines , 2016, 2016 Picture Coding Symposium (PCS).

[2]  F. Bossen,et al.  Common test conditions and software reference configurations , 2010 .

[3]  Marko Viitanen,et al.  Efficient Mode Decision Schemes for HEVC Inter Prediction , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[5]  Yishay Mansour,et al.  Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.