Prop-free pointing detection in dynamic cluttered environments

Vision-based prop-free pointing detection is challenging both from an algorithmic and a systems standpoint. From a computer vision perspective, accurately determining where multiple users are pointing is difficult in cluttered environments with dynamic scene content. Standard approaches relying on appearance models or background subtraction to segment users operate poorly in this domain. We propose a method that focuses on motion analysis to detect pointing gestures and robustly estimate the pointing direction. Our algorithm is self-initializing; as the user points, we analyze the observed motion from two cameras and infer rotation centers that best explain the observed motion. From these, we group pixel-level flow into dominant pointing vectors that each originate from a rotation center and merge across views to obtain 3D pointing vectors. However, our proposed algorithm is computationally expensive, posing systems challenges even with current computing infrastructure. We achieve interactive speeds by exploiting coarse-grained parallelization over a cluster of computers. In unconstrained environments, we obtain an average angular precision of 2.7°.

[1]  Nebojsa Jojic,et al.  Detection and estimation of pointing gestures in dense disparity maps , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[2]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yasuhito Suenaga,et al.  "Finger-Pointer": Pointing interface by image processing , 1994, Comput. Graph..

[5]  Alberto Del Bimbo,et al.  Visual capture and understanding of hand pointing actions in a 3-D environment , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Jean-Emmanuel Viallet,et al.  Pointing Gesture Visual Recognition by Body Feature Detection and Tracking , 2004, ICCVG.

[7]  I. D. Coope,et al.  Circle fitting by linear and nonlinear least squares , 1993 .

[8]  Roberto Cipolla,et al.  Uncalibrated Stereo Vision with Pointing for a Man-Machine Interface , 1994, MVA.

[9]  Richard O. Duda,et al.  Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.

[10]  Jeffrey Nichols,et al.  Interacting at a distance: measuring the performance of laser pointers and other devices , 2002, CHI.

[11]  Luc Van Gool,et al.  Real-time pointing gesture recognition for an immersive environment , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[12]  Mohan M. Trivedi,et al.  Human Body Model Acquisition and Tracking Using Voxel Data , 2003, International Journal of Computer Vision.

[13]  Rainer Stiefelhagen,et al.  Visual recognition of pointing gestures for human-robot interaction , 2007, Image Vis. Comput..

[14]  Rahul Sukthankar,et al.  Exploiting multi-level parallelism for low-latency activity recognition in streaming video , 2010, MMSys '10.

[15]  Vladimir Pavlovic,et al.  Gestural interface to a visual computing environment for molecular biologists , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[16]  John S. Zelek,et al.  Towards real-time 3-D monocular visual tracking of human limbs in unconstrained environments , 2005, Real Time Imaging.

[17]  Kazuhiko Yamamoto,et al.  Detection and Estimation of Omni-Directional Pointing Gestures Using Multiple Cameras , 2001, MVA.

[18]  Horst-Michael Groß,et al.  There You Go! - Estimating Pointing Gestures In Monocular Images For Mobile Robot Instruction , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[19]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[20]  Arcot Sowmya,et al.  Multiple Camera, Multiple Person Tracking with Pointing Gesture Recognition in Immersive Environments , 2008, ISVC.

[21]  Seong-Whan Lee,et al.  Real-time 3D pointing gesture recognition in mobile space , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[22]  Alessio Malizia,et al.  Human-Display Interaction Technology: Emerging Remote Interfaces for Pervasive Display Environments , 2010, IEEE Pervasive Computing.

[23]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[24]  Marina Kolesnik,et al.  Detecting, Tracking, and Interpretation of a Pointing Gesture by an Overhead View Camera , 2001, DAGM-Symposium.

[25]  Mubarak Shah,et al.  A virtual 3D blackboard: 3D finger tracking using a single camera , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[26]  C. Marlin Brown,et al.  Human-Computer Interface Design Guidelines , 1998 .

[27]  Katsuhiko Sakaue,et al.  Arm-pointing gesture interface using surrounded stereo cameras system , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..