RF-based 3D skeletons

This paper introduces RF-Pose3D, the first system that infers 3D human skeletons from RF signals. It requires no sensors on the body, and works with multiple people and across walls and occlusions. Further, it generates dynamic skeletons that follow the people as they move, walk or sit. As such, RF-Pose3D provides a significant leap in RF-based sensing and enables new applications in gaming, healthcare, and smart homes. RF-Pose3D is based on a novel convolutional neural network (CNN) architecture that performs high-dimensional convolutions by decomposing them into low-dimensional operations. This property allows the network to efficiently condense the spatio-temporal information in RF signals. The network first zooms in on the individuals in the scene, and crops the RF signals reflected off each person. For each individual, it localizes and tracks their body parts - head, shoulders, arms, wrists, hip, knees, and feet. Our evaluation results show that RF-Pose3D tracks each keypoint on the human body with an average error of 4.2 cm, 4.0 cm, and 4.9 cm along the X, Y, and Z axes respectively. It maintains this accuracy even in the presence of multiple people, and in new environments that it has not seen in the training set. Demo videos are available at our website: http://rfpose3d.csail.mit.edu.

[1]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Kaishun Wu,et al.  WiFall: Device-free fall detection by wireless networks , 2017, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[3]  Tommi S. Jaakkola,et al.  Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture , 2017, ICML.

[4]  Parth H. Pathak,et al.  WiWho: WiFi-Based Person Identification in Smart Spaces , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[7]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  P. Beckmann,et al.  The scattering of electromagnetic waves from rough surfaces , 1963 .

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shwetak N. Patel,et al.  Whole-home gesture recognition using wireless signals , 2013, MobiCom.

[12]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Antonio Torralba,et al.  Through-Wall Human Pose Estimation Using Radio Signals , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Lior Rokach,et al.  Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.

[15]  Frédo Durand,et al.  Capturing the human figure through a wall , 2015, ACM Trans. Graph..

[16]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[18]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[19]  Chuang Gan,et al.  The Sound of Pixels , 2018, ECCV.

[20]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jie Xiong,et al.  ArrayTrack: A Fine-Grained Indoor Location System , 2011, NSDI.

[22]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[23]  Dina Katabi,et al.  Extracting Gait Velocity and Stride Length from Surrounding Radio Signals , 2017, CHI.

[24]  Sachin Katti,et al.  WiDeo: Fine-grained Device-free Motion Tracing using RF Backscatter , 2015, NSDI.

[25]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[26]  Sachin Katti,et al.  SpotFi: Decimeter Level Localization Using WiFi , 2015, SIGCOMM.

[27]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[28]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Mark A. Richards,et al.  Fundamentals of Radar Signal Processing , 2005 .

[31]  Rob Miller,et al.  Smart Homes that Monitor Breathing and Heart Rate , 2015, CHI.

[32]  Parameswaran Ramanathan,et al.  Leveraging directional antenna capabilities for fine-grained gesture recognition , 2014, UbiComp.

[33]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[34]  Rob Miller,et al.  3D Tracking via Body Radio Reflections , 2014, NSDI.

[35]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Fadel Adib,et al.  Emotion recognition using wireless signals , 2016, MobiCom.

[38]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[39]  Jitendra Malik,et al.  Using k-Poselets for Detecting People and Localizing Their Keypoints , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Wei Wang,et al.  Gait recognition using wifi signals , 2016, UbiComp.