For Vision-Based Human Computer Interaction

In the evolution of user interfaces, keyboards were the primary devices in text-based user interfaces, and then the invention of the mouse brought us the graphical user interface. What is the counterpart of the mouse when we are trying to explore three-dimensional (3-D) virtual environments (VEs)? In many current VE applications, keyboards, mice, wands, and joysticks are the common controlling and navigating devices. However, to some extent, such mechanical devices are inconvenient and unsuitable for natural and direct interaction, because it is difficult for these devices to supply 3-D and high degree of freedom inputs. Although magnetic trackers are being used as sensors for 3-D inputs in some of these devices, they are prone to magnetic interference, and they only supply global motion information. A more convenient and natural device is desirable to achieve more immersive interaction. The use of hand gestures has become an important part of human computer interaction (HCI) in recent years [1], [24]. To use human hands as a natural interface device, some glove-based devices have been employed to capture human hand motion by attaching sensors to measure the joint angles and spatial positions of hands directly. Unfortunately, such devices are expensive and cumbersome. Since rich visual information provides a strong cue to infer the inner states of an object, vision-based techniques provide promising alternatives to capture human hand motion. At the same time, vision systems could be very cost efficient and noninvasive. These facts serve as the motivating forces for research in the modeling, analysis, animation, and recognition of hand gestures. According to different application scenarios, hand gestures can be classified into several categories: conversational gestures, controlling gestures, manipulative gestures and communicative gestures. Sign language is an important case of communicative gestures. Because sign languages are highly structured [33], [37], they are very suitable as a test-bed for vision algorithms [33], [37]. Controlling gestures are the focus of current research in vision-based interfaces [6], [17], [23], [26], [35], [45]. Virtual objects can be located by analyzing pointing gestures [24]. Some display-control applications demon-

[1]  Andrew Blake,et al.  Classification of human body motion , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Vladimir Pavlovic,et al.  A visual computing environment for very large scale biomolecular modeling , 1997, Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors.

[3]  J. Ohya,et al.  Applications of HMM modeling to recognizing human gestures in image sequences for a man-machine interface , 1995, Proceedings 4th IEEE International Workshop on Robot and Human Communication.

[4]  Aaron F. Bobick,et al.  Recognition and interpretation of parametric gesture , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[5]  Thomas S. Huang,et al.  Vision based hand modeling and tracking for virtual teleconferencing and telecollaboration , 1995, Proceedings of IEEE International Conference on Computer Vision.

[6]  Ying Wu,et al.  An Adaptive Self-Organizing Color Segmentation Algorithm with Application to Robust Real-time Human Hand Localization , 2000 .

[7]  Ying Wu,et al.  View-independent recognition of hand postures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[8]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[9]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Alex Pentland,et al.  Active gesture recognition using partially observable Markov decision processes , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[11]  Yuntao Cui,et al.  Hand sign recognition from intensity image sequences with complex backgrounds , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[12]  Mubarak Shah,et al.  Visual gesture recognition , 1994 .

[13]  Ying Wu,et al.  Capturing articulated human hand motion: a divide-and-conquer approach , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  Shan Lu,et al.  Color-based hands tracking system for sign language recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[15]  Michael J. Black,et al.  Analysis of gesture and action in technical talks for video indexing , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  S. Sarkar,et al.  Human skin and hand motion analysis from range image sequences using nonlinear FEM , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[17]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[18]  Francis K. H. Quek Unencumbered Gestural Interaction , 1996, IEEE Multim..

[19]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Rajeev Sharma,et al.  Reliable tracking of human arm dynamics by multiple cue integration and constraint fusion , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[21]  Matthew Turk,et al.  View-based interpretation of real-time optical flow for gesture recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[22]  D. McNeill Hand and Mind , 1995 .

[23]  Francis K. H. Quek,et al.  Gesture, speech, and gaze cues for discourse segmentation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[24]  James L. Crowley,et al.  Finger Tracking as an Input Device for Augmented Reality , 1995 .

[25]  Dimitris N. Metaxas,et al.  ASL recognition based on a coupling between HMMs and 3D motion analysis , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[26]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Vladimir Pavlovic,et al.  Dynamic bayesian networks for information fusion with applications to human-computer interfaces , 1999 .

[28]  Shaogang Gong,et al.  Colour Model Selection and Adaption in Dynamic Scenes , 1998, ECCV.

[29]  J. Cassell Computer Vision for Human–Machine Interaction: A Framework for Gesture Generation and Interpretation , 1998 .

[30]  Kang-Hyun Jo,et al.  Manipulative hand gesture recognition using task knowledge for human computer interaction , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[31]  Yangsheng Xu,et al.  Gesture interface: modeling and learning , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[32]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[33]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Alex Pentland,et al.  Modeling and Prediction of Human Behavior , 1999, Neural Computation.

[35]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[36]  Ying Wu,et al.  Color tracking by transductive learning , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[37]  Alex Pentland,et al.  A Wearable Computer Based American Sign Language Recognizer , 1997, SEMWEB.

[38]  Jochen Triesch,et al.  Robust classification of hand postures against complex backgrounds , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[39]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[40]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[41]  Tosiyasu L. Kunii,et al.  Model-based analysis of hand posture , 1995, IEEE Computer Graphics and Applications.

[42]  Ying Wu,et al.  Modeling the constraints of human hand motion , 2000, Proceedings Workshop on Human Motion.

[43]  John R. Kender,et al.  Finding skin in color images , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[44]  Yoshiaki Shirai,et al.  Hand gesture estimation and model refinement using monocular camera-ambiguity limitation by inequality constraints , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[45]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[46]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.