MeCap: Whole-Body Digitization for Low-Cost VR/AR Headsets

Low-cost, smartphone-powered VR/AR headsets are becoming more popular. These basic devices - little more than plastic or cardboard shells - lack advanced features, such as controllers for the hands, limiting their interactive capability. Moreover, even high-end consumer headsets lack the ability to track the body and face. For this reason, interactive experiences like social VR are underdeveloped. We introduce MeCap, which enables commodity VR headsets to be augmented with powerful motion capture ("MoCap") and user-sensing capabilities at very low cost (under $5). Using only a pair of hemi-spherical mirrors and the existing rear-facing camera of a smartphone, MeCap provides real-time estimates of a wearer's 3D body pose, hand pose, facial expression, physical appearance and surrounding environment - capabilities which are either absent in contemporary VR/AR systems or which require specialized hardware and controllers. We evaluate the accuracy of each of our tracking features, the results of which show imminent feasibility.

[1]  F. Pollick,et al.  A motion capture library for the study of identity, gender, and emotion perception from biological motion , 2006, Behavior research methods.

[2]  Stepán Obdrzálek,et al.  Accuracy and robustness of Kinect pose estimation in the context of coaching of elderly population , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[3]  Tomás Pajdla,et al.  Autocalibration & 3D reconstruction with non-central catadioptric cameras , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Justus Thies,et al.  FaceVR , 2018, ACM Trans. Graph..

[5]  Gregory D. Abowd,et al.  FingerPing: Recognizing Fine-grained Hand Poses using Active Acoustic On-body Sensing , 2018, CHI.

[6]  Chris Harrison,et al.  OmniTouch: wearable multitouch interaction everywhere , 2011, UIST.

[7]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[8]  Shree K. Nayar Sphereo: Determining Depth Using Two Specular Spheres And A Single Camera , 1989, Other Conferences.

[9]  K. Sphereo : Determining Depth using Two Specular Spheres and a Single Camera , 1988 .

[10]  W. Heidrich Environment Maps And Their Applications , 2022 .

[11]  Xiaojuan Ma,et al.  VirtualGrasp: Leveraging Experience of Interacting with Physical Objects to Facilitate Digital Object Retrieval , 2018, CHI.

[12]  Jun Gong,et al.  WristWhirl: One-handed Continuous Smartwatch Input using Wrist Gestures , 2016, UIST.

[13]  Paul E. Debevec,et al.  Image-based lighting , 2002, IEEE Computer Graphics and Applications.

[14]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yang Zhang,et al.  Pyro: Thumb-Tip Gesture Recognition Using Pyroelectric Infrared Sensing , 2017, UIST.

[16]  Gierad Laput,et al.  AuraSense: Enabling Expressive Around-Smartwatch Interactions with Electric Field Sensing , 2016, UIST.

[17]  Chris Harrison,et al.  EyeSpyVR: Interactive Eye Sensing Using Off-the-Shelf, Smartphone-Based VR Headsets , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[18]  Jonathan Tompson,et al.  PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[19]  S. Sachdeva,et al.  Fitzpatrick skin typing: applications in dermatology. , 2009, Indian journal of dermatology, venereology and leprology.

[20]  Joseph J. Lim,et al.  High-fidelity facial and speech animation for VR HMDs , 2016, ACM Trans. Graph..

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Paul E. Debevec,et al.  Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global illumination and high dynamic range photography , 1998, SIGGRAPH '08.

[23]  Hugh F. Durrant-Whyte,et al.  A solution to the simultaneous localization and map building (SLAM) problem , 2001, IEEE Trans. Robotics Autom..

[24]  Hans-Peter Seidel,et al.  Motion reconstruction using sparse accelerometer data , 2011, TOGS.

[25]  Andrew Jones,et al.  Practical multispectral lighting reproduction , 2016, ACM Trans. Graph..

[26]  Yaser Sheikh,et al.  Motion capture from body-mounted cameras , 2011, SIGGRAPH 2011.

[27]  Chongyang Ma,et al.  Facial performance sensing head-mounted display , 2015, ACM Trans. Graph..

[28]  Yang Zhang,et al.  Wall++: Room-Scale Interactive and Context-Aware Sensing , 2018, CHI.

[29]  Sehoon Ha,et al.  Human motion reconstruction from force sensors , 2011, SCA '11.

[30]  Hayes Raffle,et al.  The sound of touch , 2007, CHI Extended Abstracts.

[31]  Daniel Thalmann,et al.  First-Person Palm Pose Tracking and Gesture Recognition in Augmented Reality , 2015, VISIGRAPP.

[32]  Patrick Olivier,et al.  Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor , 2012, UIST.

[33]  Antonio Torralba,et al.  Through-Wall Human Pose Estimation Using Radio Signals , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Loren G. Terveen,et al.  The sound of one hand: a wrist-mounted bio-acoustic fingertip gesture interface , 2002, CHI Extended Abstracts.

[35]  Patrick Baudisch,et al.  Multitoe: high-precision interaction with back-projected floors based on high-resolution multi-touch input , 2010, UIST.

[36]  D. Roetenberg,et al.  Xsens MVN: Full 6DOF Human Motion Tracking Using Miniature Inertial Sensors , 2009 .

[37]  Desney S. Tan,et al.  Skinput: appropriating the body as an input surface , 2010, CHI.

[38]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Yang Zhang,et al.  Tomo: Wearable, Low-Cost Electrical Impedance Tomography for Hand Gesture Recognition , 2015, UIST.

[40]  Hans-Peter Seidel,et al.  EgoCap , 2016, ACM Trans. Graph..