ARSketch: Sketch-Based User Interface for Augmented Reality Glasses

Hand gesture interaction is a key component in Augmented Reality (AR) / Mixed Reality (MR). Users usually interact with AR/MR devices, e.g., Microsoft HoloLens, etc., via hand gestures to express their intentions and the devices will recognize the gestures and respond accordingly to users. However, the use of such technique so far is limited to only a few less-expressive hand gestures, which, unfortunately, are insufficient or inadequate to input complex information. To tackle this problem, we introduce a sketch-based neural network-driven user interface for AR/MR glasses, called ARSketch, which enables drawing sketches freely in air to interact with the devices. ARSketch combines: (1) hand pose estimation that estimates the egocentric hand poses in an energy-efficient way, (2) sketch generation that generates sketches using key point positions of hand poses, and (3) sketch-photo retrieval that takes sketches as inputs to retrieve relevant photos. The evaluation results on our collected sketch dataset demonstrate the efficacy of ARSketch for user interaction.

[1]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Per Ola Kristensson,et al.  Bare-Handed 3D Drawing in Augmented Reality , 2018, Conference on Designing Interactive Systems.

[3]  Feng Liu,et al.  Sketch Me That Shoe , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Andrew W. Fitzgibbon,et al.  Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences , 2016, ACM Trans. Graph..

[6]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[7]  Tinne Tuytelaars,et al.  Sketch classification and classification-driven analysis using Fisher vectors , 2014, ACM Trans. Graph..

[8]  Qi Ye,et al.  BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Pavlo Molchanov,et al.  Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hanqing Lu,et al.  EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition , 2018, IEEE Transactions on Multimedia.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Muhammad Imran Malik,et al.  AirScript - Creating Documents in Air , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[14]  Kyoung Mu Lee,et al.  V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Tao Xiang,et al.  Generalising Fine-Grained Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ivan E. Sutherland,et al.  Sketchpad a Man-Machine Graphical Communication System , 1899, Outstanding Dissertations in the Computer Sciences.

[17]  Debi Prosad Dogra,et al.  Fingertip Detection and Tracking for Recognition of Air-Writing in Videos , 2018, Expert Syst. Appl..

[18]  Guijin Wang,et al.  Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation , 2017, Neurocomputing.

[19]  Daniel Thalmann,et al.  3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Sergio Escalera,et al.  End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth data , 2017, ArXiv.

[21]  Hailin Jin,et al.  Sketching with Style: Visual Search with Sketches and Aesthetic Context , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[23]  Ling Shao,et al.  Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Brian Wyvill,et al.  ShapeShop: sketch-based solid modeling with BlobTrees , 2006, SBM.

[25]  Jeffrey Nichols,et al.  Swire: Sketch-based User Interface Retrieval , 2019, CHI.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Zhaohui Zhang,et al.  HandAugment: A Simple Data Augmentation Method for Depth-Based 3D Hand Pose Estimation , 2020, ArXiv.

[28]  Marc Alexa,et al.  Sketch-based shape retrieval , 2012, ACM Trans. Graph..

[29]  Fei Qiao,et al.  Region ensemble network: Improving convolutional network for hand pose estimation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[30]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[31]  Vincent Lepetit,et al.  Training a Feedback Loop for Hand Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Josef Kittler,et al.  Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Biing-Hwang Juang,et al.  Air-Writing Recognition—Part II: Detection and Recognition of Writing Activity in Continuous Stream of Motion Data , 2016, IEEE Transactions on Human-Machine Systems.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Sergio Escalera,et al.  ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Umapada Pal,et al.  A CNN Based Framework for Unistroke Numeral Recognition in Air-Writing , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[37]  Biing-Hwang Juang,et al.  Air-Writing Recognition—Part I: Modeling and Recognition of Characters, Words, and Connecting Motions , 2016, IEEE Transactions on Human-Machine Systems.

[38]  Xiaochun Cao,et al.  SketchNet: Sketch Classification with Web Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).