Multipath Event-Based Network for Low-Power Human Action Recognition

Event-based cameras are bio-inspired sensors capturing asynchronous per-pixel brightness changes (events), which have the advantages of high temporal resolution and low power consumption compared with traditional frame-based cameras. Despite recent progress in human action recognition on traditional cameras, few solutions on event cameras have been proposed due to its unconventional frameless output. To perform low-power human action recognition on event camera, we propose a multipath deep neural network for action recognition based on event camera outputs. Specifically, a fixed number of asynchronous events are accumulated to form frames for feature extraction. With events encoding dynamic information, we estimate human pose from event frames to encode static information to improve recognition accuracy. The complementary properties between dynamic event frames and static human poses are jointly explored and fused to predict action. Extensive experiments verify the effectiveness of the proposed model with a recognition accuracy of 85.91% on the DHP19 dataset.

[1]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[2]  T. Delbruck,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .

[3]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[4]  Tobi Delbrück,et al.  DHP19: Dynamic Vision Sensor 3D Human Pose Dataset , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yiannis Andreopoulos,et al.  Graph-Based Object Classification for Neuromorphic Vision Sensing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Junsong Yuan,et al.  Recognizing Human Actions as the Evolution of Pose Estimation Maps , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[11]  Tobi Delbrück,et al.  A Low Power, Fully Event-Based Gesture Recognition System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Narciso García,et al.  Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[14]  Nitish V. Thakor,et al.  HFirst: A Temporal Approach to Object Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  David Picard,et al.  2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Tobi Delbrück,et al.  Steering a predator robot using a mixed frame/event-driven convolutional neural network , 2016, 2016 Second International Conference on Event-based Control, Communication, and Signal Processing (EBCCSP).

[17]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[18]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[19]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[20]  Garrick Orchard,et al.  HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Junsong Yuan,et al.  Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[22]  Cordelia Schmid,et al.  PoTion: Pose MoTion Representation for Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Xiaohui Yuan,et al.  Automatic feature point detection and tracking of human actions in time-of-flight videos , 2017, IEEE/CAA Journal of Automatica Sinica.