Poster Abstract: 3D Activity Localization with Multiple Sensors

We present a deep learning framework for fast 3D activity localization and tracking in a dynamic and crowded real world setting. Our training approach reverses the traditional activity localization approach, which first estimates the possible location of activities and then predicts their occurrence. Instead, we first trained a deep convolutional neural network for activity recognition using depth video and RFID data as input, and then used the activation maps of the network to locate the recognized activity in the 3D space. Our system achieved around 20cm average localization error (in a 4m by 5m room) which is comparable to Kinect's body skeleton tracking error (10-20cm), but our system tracks activities instead of Kinect's location of people.

[1]  Ivan Marsic,et al.  Online process phase detection using multimodal deep learning , 2016, 2016 IEEE 7th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON).

[2]  Andrea Vedaldi,et al.  Visualizing Deep Convolutional Neural Networks Using Natural Pre-images , 2015, International Journal of Computer Vision.