Use of facial landmarks for adaptive compression of videos on mobile devices

Challenges, such as requirements of resources, limited availability of storage space on devices, and mobile bandwidth spectrum, inhibit unconstrained and ubiquitous video consumption. We propose a first-of-its-kind methodology to compress videos that stream human faces. We detect facial landmarks on-the-fly and compress the video by storing a sequence of distinct frames extracted from the video, such that the facial landmarks of a pair of successively stored frames are significantly different. We use a dynamic thresholding technique to detect the significance of difference and store meta-information for reconstructing the missing frames. To reduce glitches in the decompressed video, we use morphing technique that smoothens the transition between successive frames. We measure the objective goodness of our technique by evaluating the time taken to compress, the entropy per frame, the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), and compression ratio. For subjective analysis, we perform a user study observing user satisfaction at different compression ratios.

[1]  Takeo Kanade,et al.  Detection, tracking, and classification of action units in facial expression , 2000, Robotics Auton. Syst..

[2]  P. Ekman,et al.  Measuring facial movement , 1976 .

[3]  Andrea J. Goldsmith,et al.  Cross-layer design of ad hoc networks for real-time video streaming , 2005, IEEE Wireless Communications.

[4]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[5]  Anastasis A. Sofokleous,et al.  Review: H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia , 2005, Comput. J..

[6]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Michael F. Cohen,et al.  Very low frame-rate video streaming for face-to-face teleconference , 2005, Data Compression Conference.

[8]  In-Ho Choi,et al.  Robust Facial Expression Recognition Against Illumination Variation Appeared in Mobile Environment , 2011, 2011 First ACIS/JNU International Conference on Computers, Networks, Systems and Industrial Engineering.

[9]  Andrew T. Campbell,et al.  Bewell: A smartphone application to monitor, model and promote wellbeing , 2011, PervasiveHealth 2011.

[10]  William I. Grosky,et al.  Delaunay triangulation for image object indexing: a novel method for shape representation , 1998, Electronic Imaging.

[11]  Alexander Zelinsky,et al.  Robust Real-Time Face Tracking and Gesture Recognition , 1997, IJCAI.

[12]  David J. Kriegman,et al.  Video-based face recognition using probabilistic appearance manifolds , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Martin Bichsel Automatic interpolation and recognition of face images by morphing , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[14]  Ruchika Banerjee,et al.  Video compression technique using facial landmarks on mobile devices , 2016 .

[15]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[17]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[18]  B. Prabhakaran,et al.  Real-Time Mobile Facial Expression Recognition System -- A Case Study , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Masatoshi Okutomi,et al.  Seamless image cloning by a closed form solution of a modified Poisson problem , 2012, SA '12.

[20]  Emiliano Miluzzo,et al.  A survey of mobile phone sensing , 2010, IEEE Communications Magazine.

[21]  Zhi-Li Zhang,et al.  Video staging: a proxy-server-based approach to end-to-end video delivery over wide-area networks , 2000, TNET.

[22]  Djemel Ziou,et al.  Image Quality Metrics: PSNR vs. SSIM , 2010, 2010 20th International Conference on Pattern Recognition.

[23]  Christine L. Lisetti,et al.  Toward multimodal fusion of affective cues , 2006, HCM '06.

[24]  Mohammed Yeasin,et al.  Recognition of facial expressions and measurement of levels of interest from video , 2006, IEEE Transactions on Multimedia.

[25]  Jun Ohya,et al.  Spotting segments displaying facial expression from image sequences using HMM , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.