GWindows: robust stereo vision for gesture-based control of windows

Perceptual user interfaces promise modes of fluid computer-human interaction that complement the mouse and keyboard, and have been especially motivated in non-desktop scenarios, such as kiosks or smart rooms. Such interfaces, however, have been slow to see use for a variety of reasons, including the computational burden they impose, a lack of robustness outside the laboratory, unreasonable calibration demands, and a shortage of sufficiently compelling applications. We address these difficulties by using a fast stereo vision algorithm for recognizing hand positions and gestures. Our system uses two inexpensive video cameras to extract depth information. This depth information enhances automatic object detection and tracking robustness, and may also be used in applications. We demonstrate the algorithm in combination with speech recognition to perform several basic window management tasks, report on a user study probing the ease of using the system, and discuss the implications of such a system for future user interfaces.

[1]  François Bérard The Perceptual Window: Head Motion as a New Input Stream , 1999, INTERACT.

[2]  Y. Guiard Asymmetric division of labor in human skilled bimanual action: the kinematic chain as a model. , 1987, Journal of motor behavior.

[3]  Alex Pentland,et al.  The ALIVE system: wireless, full-body interaction with autonomous agents , 1997, Multimedia Systems.

[4]  William T. Freeman,et al.  Television control by hand gestures , 1994 .

[5]  James A. Landay,et al.  Implications for a gesture design tool , 1999, CHI '99.

[6]  Noëlle Carbonell,et al.  An experimental study of future “natural” multimodal human-computer interaction , 1993, CHI '93.

[7]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[8]  Harold Fox,et al.  Evaluating look-to-talk: a gaze-aware interface in a collaborative environment , 2002, CHI Extended Abstracts.

[9]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[10]  F. Kjeldsen,et al.  Visual interpretation for hand gestures a s a practical in-terface modality , 1997 .

[11]  Katsuhiko Sakaue,et al.  Utilization of stereo disparity and optical flow information for human interaction , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[12]  Eric Horvitz,et al.  Principles of mixed-initiative user interfaces , 1999, CHI '99.

[13]  Andy Cockburn,et al.  Gesture navigation: an alternative 'back' for the future , 2002, CHI Extended Abstracts.

[14]  Eric Horvitz,et al.  A computational architecture for conversation , 1999 .

[15]  Alex Pentland,et al.  Real-time self-calibrating stereo person tracking using 3-D shape estimation from blob features , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[16]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[17]  Takeo Kanade,et al.  Development of a video-rate stereo machine , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[18]  Krishna Bharat,et al.  Making computers easier for older adults to use: area cursors and sticky icons , 1997, CHI.

[19]  S. Yantis,et al.  Uniqueness of abrupt visual onset in capturing attention , 1988, Perception & psychophysics.

[20]  D. Simons,et al.  Moving and looming stimuli capture attention , 2003, Perception & psychophysics.

[21]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[23]  W. Buxton,et al.  A study in two-handed input , 1986, CHI '86.

[24]  Nebojsa Jojic,et al.  Detection and estimation of pointing gestures in dense disparity maps , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).