Spatio-Temporal Hough Forest for efficient detection-localisation-recognition of fingerwriting in egocentric camera

Abstract Recognising fingerwriting in mid-air is a useful input tool for wearable egocentric camera. In this paper we propose a novel framework to this purpose. Specifically, our method first detects a writing hand posture and locates the position of index fingertip in each frame. From the trajectory of the fingertip, the written character is localised and recognised simultaneously. To achieve this challenging task, we first present a contour-based view independent hand posture descriptor extracted with a novel signature function. The proposed descriptor serves both posture recognition and fingertip detection. As to recognising characters from trajectories, we propose Spatio-Temporal Hough Forest that takes sequential data as input and perform regression on both spatial and temporal domain. Therefore our method can perform character recognition and localisation simultaneously. To establish our contributions, a new handwriting-in-mid-air dataset with labels for postures, fingertips and character locations is proposed. We design and conduct experiments of posture estimation, fingertip detection, character recognition and localisation. In all experiments our method demonstrates superior accuracy and robustness compared to prior arts.

[1]  J. C. Leger Menger curvature and rectifiability , 1999 .

[2]  Yang Li,et al.  A real-time multi-cue hand tracking algorithm based on computer vision , 2010, 2010 IEEE Virtual Reality Conference (VR).

[3]  Dengsheng Zhang,et al.  A comparative study on shape retrieval using Fourier descriptiors with different shape signatures , 2001 .

[4]  Debanga Raj Neog,et al.  Fingertip Detection for Hand Pose Recognition , 2012 .

[5]  Lijun Yin,et al.  Multi-scale Topological Features for Hand Posture Representation and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.

[7]  Yi Li,et al.  Dynamic hand gesture recognition using hidden Markov models , 2012, 2012 7th International Conference on Computer Science & Education (ICCSE).

[8]  David W. Murray,et al.  Wearable hand activity recognition for event summarization , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[9]  Tanja Schultz,et al.  Vision-based handwriting recognition for unrestricted text input in mid-air , 2012, ICMI '12.

[10]  J. Feldman,et al.  Information along contours and object boundaries. , 2005, Psychological review.

[11]  Yang Liu,et al.  Hand-Gesture Based Text Input for Wearable Computers , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[12]  Yonghong Song,et al.  Real Time Fingertip Detection with Kinect Depth Image Sequences , 2014, 2014 22nd International Conference on Pattern Recognition.

[13]  Ankit Chaudhary,et al.  Tracking of Fingertips and Centers of Palm Using KINECT , 2011, 2011 Third International Conference on Computational Intelligence, Modelling & Simulation.

[14]  Ayoub Al-Hamadi,et al.  A Hidden Markov Model-based continuous gesture recognition system for hand motion trajectory , 2008, 2008 19th International Conference on Pattern Recognition.

[15]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Lei Li,et al.  Handwriting and Gestures in the Air, Recognizing on the Fly , 2013 .

[17]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  King-Sun Fu,et al.  Shape Discrimination Using Fourier Descriptors , 1977, IEEE Trans. Syst. Man Cybern..

[19]  Richard Bowden,et al.  Multi-touchless: Real-time fingertip detection and tracking using geodesic maxima , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[20]  Salah Bourennane,et al.  Comparison of fourier descriptors and Hu moments for hand posture recognition , 2007, 2007 15th European Signal Processing Conference.

[21]  Chung-Lin Huang,et al.  Hand gesture recognition using a real-time tracking method and hidden Markov models , 2003, Image Vis. Comput..

[22]  Stan Sclaroff,et al.  A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[24]  Xin Zhang,et al.  Real-time fingertip tracking and detection using Kinect depth sensor for a new writing-in-the air system , 2012, ICIMCS '12.

[25]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Nathaniel Troutman,et al.  Using spatiotemporal relational random forests to improve our understanding of severe weather processes , 2011, Stat. Anal. Data Min..

[27]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Ankit Chaudhary,et al.  Fingertip Detection: A Fast Method with Natural Hand , 2012, ArXiv.

[29]  Gang Yu,et al.  Action Search by Example Using Randomized Visual Vocabularies , 2013, IEEE Transactions on Image Processing.

[30]  Deva Ramanan,et al.  3D Hand Pose Detection in Egocentric RGB-D Images , 2014, ECCV Workshops.

[31]  Luc Van Gool,et al.  A Hough transform-based voting framework for action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  C. V. Jawahar,et al.  Online handwriting recognition using depth sensors , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[34]  Sami Huttunen,et al.  Motion-based finger tracking for user interaction with mobile devices , 2007 .

[35]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[36]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, CVPR.

[37]  Woontack Woo,et al.  3D Finger CAPE: Clicking Action and Position Estimation under Self-Occlusions in Egocentric Viewpoint , 2015, IEEE Transactions on Visualization and Computer Graphics.

[38]  Lianwen Jin,et al.  A Novel Vision-Based Finger-Writing Character Recognition System , 2007, J. Circuits Syst. Comput..

[39]  Hiroshi Murase,et al.  A Hilbert warping method for handwriting gesture recognition , 2010, Pattern Recognit..

[40]  Kpalma Kidiyo,et al.  A Survey of Shape Feature Extraction Techniques , 2008 .

[41]  Daniel Thalmann,et al.  3D fingertip and palm tracking in depth image sequences , 2012, ACM Multimedia.

[42]  Anthony G. Cohn,et al.  Egocentric Activity Monitoring and Recovery , 2012, ACCV.

[43]  Tobias Höllerer,et al.  Handy AR: Markerless Inspection of Augmented Reality Objects Using Fingertip Tracking , 2007, 2007 11th IEEE International Symposium on Wearable Computers.

[44]  Yoichi Sato,et al.  Real-Time Fingertip Tracking and Gesture Recognition , 2002, IEEE Computer Graphics and Applications.

[45]  Krystian Mikolajczyk,et al.  Action recognition with motion-appearance vocabulary forest , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Guojun Lu,et al.  Review of shape representation and description techniques , 2004, Pattern Recognit..

[47]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[48]  Xin Zhang,et al.  A New Writing Experience: Finger Writing in the Air Using a Kinect Sensor , 2013, IEEE MultiMedia.

[49]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[50]  Caroline Fossati,et al.  Comparison of shape descriptors for hand posture recognition in video , 2012, Signal Image Video Process..

[51]  Ali Ahmed,et al.  Hand gesture based user interface for computer using a camera and projector , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[52]  Cheng Li,et al.  Model Recommendation with Virtual Probes for Egocentric Hand Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[53]  John K. Williams,et al.  Spatiotemporal Relational Random Forests , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[54]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[55]  Amy McGovern,et al.  Spatiotemporal Relational Probability Trees: An Introduction , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[56]  Massimo Panella,et al.  An Accurate Algorithm for the Identification of Fingertips Using an RGB-D Camera , 2013, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.