Incorporating On-demand Stereo for Real Time Recognition

A new method for localising and recognising hand poses and objects in real-time is presented. This problem is important in vision-driven applications where it is natural for a user to combine hand gestures and real objects when interacting with a machine. Examples include using a real eraser to remove words from a document displayed on an electronic surface. In this paper the task of simultaneously recognising object classes, hand gestures and detecting touch events is cast as a single classification problem. A random forest algorithm is employed which adaptively selects and combines a minimal set of appearance, shape and stereo features to achieve maximum class discrimination for a given image. This minimal set leads to both efficiency at run time and good generalisation. Unlike previous stereo works which explicitly construct disparity maps, here the stereo matching costs are used directly as visual cue and only computed on-demand, i.e. only for pixels where they are necessary for recognition. This leads to improved efficiency. The proposed method is assessed on a database of a variety of objects and hand poses selected for interacting on a flat surface in an office environment.

[1]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrew W. Fitzgibbon,et al.  Real-time gesture recognition using deterministic boosting , 2002, BMVC.

[4]  Andrew Blake,et al.  Bi-layer segmentation of binocular stereo video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[6]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[7]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[8]  Heinrich Niemann,et al.  Dense disparity maps in real-time with an application to augmented reality , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[9]  Andrew Blake,et al.  Efficient Dense Stereo with Occlusions for New View-Synthesis by Four-State Dynamic Programming , 2006, International Journal of Computer Vision.

[10]  Irfan A. Essa,et al.  Tree-based Classifiers for Bilayer Video Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Vladimir Kolmogorov,et al.  Graph Cut Algorithms for Binocular Stereo with Occlusions , 2006, Handbook of Mathematical Models in Computer Vision.

[12]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[13]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Andrew D. Wilson Multimodal Sensing for Explicit and Implicit Interaction , 2005 .

[15]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Bill Buxton,et al.  Multi-Touch Systems that I Have Known and Loved , 2009 .

[17]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Andrew Blake,et al.  Sparse and Semi-supervised Visual Mapping with the S^3GP , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[22]  Carlo Tomasi,et al.  Full-size projection keyboard for handheld devices , 2003, CACM.

[23]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[24]  Ruigang Yang,et al.  Multi-resolution real-time stereo on commodity graphics hardware , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  Andrew D. Wilson PlayAnywhere: a compact interactive tabletop projection-vision system , 2005, UIST.

[26]  Irfan Essa,et al.  Tree - based Clas - sifiers for Bilayer Video Segmentation In CVPR , 2007, CVPR 2007.

[27]  Nikos Paragios,et al.  Handbook of Mathematical Models in Computer Vision , 2005 .