Differential video coding of face and gesture events in presentation videos

Currently, bandwidth limitations pose a major challenge for delivering high-quality multimedia information over the Internet to users. In this research, we aim to provide a better compression of presentation videos (e.g., lectures). The approach is based on the idea that people tend to pay more attention to the face and gesturing hands, and therefore these regions are given more resolution than the remaining image. Our method first detects and tracks the face and hand regions using color-based segmentation and Kalman filtering. Next, different classes of natural hand gesture are recognized from the hand trajectories by identifying gesture holds, position/velocity changes, and repetitive movements. The detected face/ hand regions and gesture events in the video are then encoded at higher resolution than the remaining lower-resolution background. We present results of the tracking and gesture recognition approach, and evaluate and compare videos compressed with the proposed method to uniform compression.

[1]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[2]  Francis Quek,et al.  Gesture cues for conversational interaction in monocular video , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[3]  Surya Nepal,et al.  Automatic detection of 'Goal' segments in basketball videos , 2001, MULTIMEDIA '01.

[4]  Clement T. Yu,et al.  Detecting human faces in color images , 1998, Proceedings International Workshop on Multi-Media Database Management Systems (Cat. No.98TB100249).

[5]  Satoru Hayamizu,et al.  Are Listeners Paying Attention to the Hand Gestures of an Anthropomorphic Agent? An Evaluation Using a Gaze Tracking Method , 1997, Gesture Workshop.

[6]  Dragutin Petkovic,et al.  CueVideo: a system for cross-modal search and browse of video databases , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  A. Ardeshir Goshtasby,et al.  Detecting human faces in color images , 1998, Image Vis. Comput..

[9]  Harriet J. Nock,et al.  Assessing face and speech consistency for monologue detection in video , 2002, MULTIMEDIA '02.

[10]  Yung-Chang Chen,et al.  Low-complexity face-assisted video coding , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[11]  Gary R. Bradski,et al.  Real time face and object tracking as a component of a perceptual user interface , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Shan Lu,et al.  Color-based hands tracking system for sign language recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[14]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  IEEE Workshop on Detection and Recognition of Events in Video, EVENT 2001, Vancouver, BC, Canada, July 8, 2001, Proceedings , 2001, EVENT.

[16]  Wayne H. Wolf,et al.  Human activity detection in MPEG sequences , 2000, Proceedings Workshop on Human Motion.

[17]  Francis Quek,et al.  Comparison of five color models in skin pixel classification , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[18]  Eli Saber,et al.  Frontal-view face detection and facial feature extraction using color, shape and symmetry based cost functions , 1998, Pattern Recognit. Lett..

[19]  Mubarak Shah,et al.  A framework for segmentation of talk and game shows , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Jenq-Neng Hwang,et al.  Scene context dependent rate control , 2001, MULTIMEDIA '01.

[21]  Kenneth E. Barner,et al.  Region of interest priority coding for sign language videoconferencing , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[22]  David A. Forsyth,et al.  Finding Naked People , 1996, ECCV.

[23]  M. Studdert-Kennedy Hand and Mind: What Gestures Reveal About Thought. , 1994 .

[24]  Anna Esposito,et al.  Automatic Hand Hold Detection in Natural Conversation , 2001 .

[25]  Chun Chen,et al.  A new foreground extraction scheme for video streams , 2001, MULTIMEDIA '01.

[26]  Rajeev Sharma,et al.  Understanding Gestures in Multimodal Human Computer Interaction , 2000, Int. J. Artif. Intell. Tools.

[27]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Mohammed Yeasin,et al.  Improving continuous gesture recognition with spoken prosody , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[29]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[30]  Narendra Ahuja,et al.  Detecting human faces in color images , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[31]  James W. Davis,et al.  A perceptual user interface for recognizing head gesture acknowledgements , 2001, PUI '01.

[32]  Tanveer F. Syeda-Mahmood,et al.  Indexing colored surfaces in images , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[33]  J. J. Garcia-Luna-Aceves,et al.  Multimedia Communications: Protocols and Applications , 1997 .