Eyes in the interface

Abstract Computer vision has a significant role to play in the human-computer interaction (HCI) devices of the future. All computer input devices serve one essential purpose. They transduce some motion or energy from a human agent into machine useable signals. One may therefore think of input devices as the ‘perceptual organs’ by which computers sense the intents of their human users. We outline the role computer vision will play, highlight the impediments to the development of vision-based interfaces, and propose an approach for overcoming these impediments. Prospective vision research areas for HCI include human face recognition, facial expression interpretation, lip reading, head orientation detection, eye gaze tracking three-dimensional finger pointing, hand tracking, hand gesture interpretation and body pose tracking. For vision-based interfaces to make any impact, we will have to embark on an expansive approach, which begins with the study of the interaction modality we seek to implement. We illustrate our approach by discussing our work on vision-based hand gesture interfaces. This work is based on information from such varied disciplines as semiotics, anthropology, neurophysiology, neuropsychology and psycholinguistics. Concentrating on communicative (as opposed to manipulative) gestures, we argue that interpretation of a large number of gestures involves analysis of image dynamics to identify and characterize the gestural stroke, locating the stroke extrema in ordinal 3D space, and recognizing the hand pose at stroke extrema. We detail our dynamic image analysis algorithm which enforces our constraints: directional variance, spatial cohesion, directional cohesion and path cohesion. The clustered vectors characterize the motion of a gesturing hand.

[1]  John Sibert,et al.  Issues limiting the acceptance of user interfaces using gesture input and handwriting character recognition (panel) , 1986, CHI '87.

[2]  Alex Pentland,et al.  Visually Controlled Graphics , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Michel Beaudouin-Lafon,et al.  Charade: remote control of objects using free-hand gestures , 1993, CACM.

[4]  Roberto Cipolla,et al.  Qualitative Visual Interpretation of 3d Hand Gestures Using Motion Parallax , 1992, MVA.

[5]  Jim Z. C. Lai Tracking multiple features using relaxation , 1993, Pattern Recognit..

[6]  D. McNeill Hand and Mind , 1995 .

[7]  Steve Ditlea Another world: inside artificial reality , 1989 .

[8]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[9]  Yasuhiko Watanabe,et al.  Real-time head motion detection system , 1990, Other Conferences.

[10]  Geoffrey E. Hinton,et al.  Glove-Talk: a neural network interface between a data-glove and a speech synthesizer , 1993, IEEE Trans. Neural Networks.

[11]  Randy F. Pausch,et al.  Giving CANDY to children: user-tailored gesture input driving an articulator-based speech synthesizer , 1991, CACM.

[12]  Tom G. Zimmerman,et al.  A hand gesture interface device , 1987, CHI '87.

[13]  Howard Rheingold,et al.  Virtual Reality , 1991 .

[14]  Thomas S. Huang,et al.  Final Report To NSF of the Planning Workshop on Facial Expression Understanding , 1992 .

[15]  R.J.K. Jacob,et al.  Hot topics-eye-gaze computer interfaces: what you look at is what you get , 1993, Computer.

[16]  Hans Christian von Baeyer The Atomic Cathedral , 1987 .

[17]  K. Preston White,et al.  Spatially dynamic calibration of an eye-tracking system , 1993, IEEE Trans. Syst. Man Cybern..

[18]  Mark Steedman Speech, Place, and Action , 1982 .

[19]  Yasuhito Suenaga,et al.  Real-Time Detection of Pointing Actions for a Glove-Free Interface , 1992, MVA.

[20]  Francis K. H. Quek,et al.  Toward a vision-based hand gesture interface , 1994 .

[21]  Naceur Kerkeni,et al.  Proposition of a Human Motion Tracking Method by Temporal-Spatial Segmentation in an Image Sequence , 1992, MVA.

[22]  Beth Levy,et al.  Conceptual Representations in Lan-guage Activity and Gesture , 1980 .

[23]  Ellen C. Hildreth,et al.  Computations Underlying the Measurement of Visual Motion , 1984, Artif. Intell..

[24]  Yasuhito Suenaga,et al.  Automatic Extraction of Target Images for Face Identification Using the Sub-Space Classification Method (Special Section on Machine Vision Applications) , 1993 .

[25]  J. Foley Interfaces for advanced computing , 1987 .

[26]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  A. Kendon Gesticulation and Speech: Two Aspects of the Process of Utterance , 1981 .

[28]  Scott S. Fisher,et al.  Virtual interface environment , 1986 .

[29]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .

[30]  Paul F Kirvan Conversing with computers , 1984 .

[31]  Elizabeth M. Wenzel,et al.  Virtual Interface Environment Workstations , 1988 .

[32]  Demetri Terzopoulos,et al.  Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Mark A. Clarkson An easier interface , 1991 .

[34]  Mark B. Friedman Gestural Control of Robot End Effectors , 1987, Other Conferences.

[35]  Norman I. Badler,et al.  Final Report to Nsf of the Standards for Facial Animation Workshop Final Report to Nsf of the Standards for Facial Animation Workshop , 1994 .

[36]  E. Klima The signs of language , 1979 .

[37]  Jaron Lanier,et al.  Reality built for two: a virtual reality tool , 1990, I3D '90.

[38]  Martin L. A. Sternberg American Sign Language: A Comprehensive Dictionary , 1993 .

[39]  R. E. Barr,et al.  Smoothing of noisy human motion data using digital filtering and spline curves , 1988, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[40]  M. R. Harris,et al.  Exploring Virtual Worlds With Head-Mounted Displays , 1989, Photonics West - Lasers and Applications in Science and Engineering.

[41]  Michael Studdert-Kennedy,et al.  Language in Another Mode. , 1980 .

[42]  Jack Sklansky,et al.  Estimating optical flow from clustered trajectories in velocity-time , 1992, [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition.

[43]  Michael W. McGreevy,et al.  Virtual Workstation: A Multimodal, Stereoscopic Display Environment , 1987, Other Conferences.

[44]  S. Joy Mountford,et al.  The Art of Human-Computer Interface Design , 1990 .

[45]  Pierre David Wellner,et al.  Interacting with paper on the DigitalDesk , 1993, CACM.

[46]  Terrence J. Sejnowski,et al.  SEXNET: A Neural Network Identifies Sex From Human Faces , 1990, NIPS.

[47]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  S. Nishida Speech recognition enhancement by lip information , 1986, CHI '86.

[49]  A. Lecours,et al.  The Biological foundations of gestures : motor and semiotic aspects , 1986 .

[50]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[51]  Richard A. Bolt The Integrated Multi-Modal Interface (ヒュ-マンインタフェ-ス特集) , 1987 .