Spatio-temporal model-assisted compatible coding for low and very low bitrate videotelephony

We introduce the concept of spatio-temporal model-assisted compatible (STMAC) coding, a technique to selectively encode areas of different importance to the human eye in terms of space and time in moving images. For this, we use the fact that human "eye contact" and "lip synchronization" are very important in person-to-person communication. Several areas including the eyes and lips need different types of quality, since different areas have different perceptual significance to human observers. For example, for the eyes "high resolution" is needed for clear communication, while for the lips "frequent refresh" is needed. The approach provides a better rate-distortion tradeoff than conventional image coding technologies based on MPEG-1, MPEG-2, H.261, as well as H.263, since STMAC coding is applied on top of an encoder, taking full advantage of its core design. The decoder does not need to be changed in any way although the encoder's rate control unit is slightly modified. This characteristic leads to the name "compatible" in the proposed concept. Experimental results are given using ITU-T H.263, addressing very low bit rate compression (13-17 Kbps).