Action Recognition Using Single-Pixel Time-of-Flight Detection

Action recognition is a challenging task that plays an important role in many robotic systems, which highly depend on visual input feeds. However, due to privacy concerns, it is important to find a method which can recognise actions without using visual feed. In this paper, we propose a concept for detecting actions while preserving the test subject’s privacy. Our proposed method relies only on recording the temporal evolution of light pulses scattered back from the scene. Such data trace to record one action contains a sequence of one-dimensional arrays of voltage values acquired by a single-pixel detector at 1 GHz repetition rate. Information about both the distance to the object and its shape are embedded in the traces. We apply machine learning in the form of recurrent neural networks for data analysis and demonstrate successful action recognition. The experimental results show that our proposed method could achieve on average 96.47% accuracy on the actions walking forward, walking backwards, sitting down, standing up and waving hand, using recurrent neural network.

[1]  R. Raskar,et al.  Recovering three-dimensional shape around a corner using ultrafast time-of-flight imaging , 2012, Nature Communications.

[2]  Daniele Faccio,et al.  A trillion frames per second: the techniques and applications of light-in-flight photography , 2018, Reports on progress in physics. Physical Society.

[3]  Janusz Konrad,et al.  Towards privacy-preserving recognition of human activities , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[4]  Ramesh Raskar,et al.  Estimating Motion and size of moving non-line-of-sight objects in cluttered environments , 2011, CVPR 2011.

[5]  K. Eliceiri,et al.  Non-line-of-sight imaging using a time-gated single photon avalanche diode. , 2015, Optics express.

[6]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[7]  Yong Wang,et al.  Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction , 2017, Sensors.

[8]  Ick,et al.  DiffuserCam : Lensless Single-exposure 3 D Imaging , 2017 .

[9]  Sergio Escalera,et al.  Integrating Vision and Language for First-Impression Personality Analysis , 2018, IEEE MultiMedia.

[10]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[11]  Robert Henderson,et al.  Detection and tracking of moving objects hidden from view , 2015, Nature Photonics.

[12]  Tinne Tuytelaars,et al.  Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Gholamreza Anbarjafari,et al.  Low-quality fingerprint classification using deep neural network , 2018, IET Biom..

[14]  Andrea Fossati,et al.  Consumer Depth Cameras for Computer Vision , 2013, Advances in Computer Vision and Pattern Recognition.

[15]  Radu Horaud,et al.  Action Recognition Robust to Background Clutter by Using Stereo Vision , 2012, ECCV Workshops.

[16]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Honglak Lee,et al.  Deep learning for robust feature generation in audiovisual emotion recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Ramesh Raskar,et al.  Looking around the corner using transient imaging , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Sergio Escalera,et al.  Automatic Recognition of Facial Displays of Unfelt Emotions , 2017, IEEE Transactions on Affective Computing.

[20]  Sergio Escalera,et al.  Joint Challenge on Dominant and Complementary Emotion Recognition Using Micro Emotion Features and Head-Pose Estimation: Databases , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[21]  Sergio Escalera,et al.  Survey on Emotional Body Gesture Recognition , 2018, IEEE Transactions on Affective Computing.

[22]  Jonathan Leach,et al.  Non-line-of-sight tracking of people at long range , 2017, Optics express.

[23]  Li Jia,et al.  Using Time-of-Flight Measurements for Privacy-Preserving Tracking in a Smart Room , 2014, IEEE Transactions on Industrial Informatics.

[24]  Kris M. Kitani,et al.  Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  A. Gatti,et al.  Ghost imaging with thermal light: comparing entanglement and classical correlation. , 2003, Physical review letters.

[26]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[27]  Tomás Pajdla,et al.  3D with Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[28]  Jeffrey H. Shapiro,et al.  Computational ghost imaging , 2008, 2009 Conference on Lasers and Electro-Optics and 2009 Conference on Quantum electronics and Laser Science Conference.

[29]  Wolfgang Heidrich,et al.  Low-budget transient imaging using photonic mixer devices , 2013, ACM Trans. Graph..

[30]  Gholamreza Anbarjafari,et al.  Multimodal Database of Emotional Speech, Video and Gestures , 2018, CVAUI/IWCF/MIPPSNA@ICPR.

[31]  Maie Bachmann,et al.  Audiovisual emotion recognition in wild , 2018, Machine Vision and Applications.

[32]  Petros Daras,et al.  Real-Time Skeleton-Tracking-Based Human Action Recognition Using Kinect Data , 2014, MMM.

[33]  David Fofi,et al.  A comparative survey on invisible structured light , 2004, IS&T/SPIE Electronic Imaging.

[34]  Sergio Escalera,et al.  Deep learning based super-resolution for improved action recognition , 2015, 2015 International Conference on Image Processing Theory, Tools and Applications (IPTA).

[35]  Sergio Escalera,et al.  Deep Multimodal Pain Recognition: A Database and Comparison of Spatio-Temporal Visual Modalities , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[36]  Paul J. Besl,et al.  Active, optical range imaging sensors , 1988, Machine Vision and Applications.

[37]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[38]  Hiroshi Murase,et al.  Action recognition from extremely low-resolution thermal image sequence , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[39]  Sergio Escalera,et al.  Dominant and Complementary Emotion Recognition From Still Images of Faces , 2018, IEEE Access.

[40]  Sergio Escalera,et al.  Gesture and Action Recognition by Evolved Dynamic Subgestures , 2015, BMVC.

[41]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Daniel Buschek,et al.  Neural network identification of people hidden from view with a single-pixel, single-photon detector , 2017, Scientific Reports.

[43]  Xin Chen,et al.  Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[44]  Sergio Escalera,et al.  Changes in Facial Expression as Biometric: A Database and Benchmarks of Identification , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[45]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[46]  Graham M. Gibson,et al.  Single-pixel three-dimensional imaging with time-based depth resolution , 2016, Nature Communications.

[47]  Laura Waller,et al.  DiffuserCam: Lensless Single-exposure 3D Imaging , 2017, ArXiv.

[48]  Xiaoli Li,et al.  Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition , 2015, IJCAI.

[49]  Sergio Escalera,et al.  Automatic Access Control Based on Face and Hand Biometrics in a Non-cooperative Context , 2018, 2018 IEEE Winter Applications of Computer Vision Workshops (WACVW).

[50]  Sergio Escalera,et al.  Results and Analysis of ChaLearn LAP Multi-modal Isolated and Continuous Gesture Recognition, and Real Versus Fake Expressed Emotions Challenges , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[51]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[52]  Gholamreza Anbarjafari,et al.  Automatic Hidden Sadness Detection Using Micro-Expressions , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[53]  Xiu-Shen Wei,et al.  Deep Bimodal Regression for Apparent Personality Analysis , 2016, ECCV Workshops.

[54]  Shuai Tao,et al.  Privacy-Preserved Behavior Analysis and Fall Detection by an Infrared Ceiling Sensor Network , 2012, Sensors.

[55]  Sergio Escalera,et al.  Fusion of classifier predictions for audio-visual emotion recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[56]  Yoann Altmann,et al.  Real-Time Tracking of Hidden Objects with Single-Pixel Detectors , 2016 .