论文信息 - Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing

Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing

Neuromorphic vision sensing (NVS) devices represent visual information as sequences of asynchronous discrete events (a.k.a., “spikes”) in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, feature representation for NVS is far behind its APS-based counterparts, resulting in lower performance in high-level computer vision tasks. To fully utilize its sparse and asynchronous nature, we propose a compact graph representation for NVS, which allows for end-to-end learning with graph convolution neural networks. We couple this with a novel end-to-end feature learning framework that accommodates both appearance-based and motion-based tasks. The core of our framework comprises a spatial feature learning module, which utilizes residual-graph convolutional neural networks (RG-CNN), for end-to-end learning of appearance-based features directly from graphs. We extend this with our proposed Graph2Grid block and temporal feature learning module for efficiently modelling temporal dependencies over multiple graphs and a long temporal extent. We show how our framework can be configured for object classification, action recognition and action similarity labeling. Importantly, our approach preserves the spatial and temporal coherence of spike events, while requiring less computation and memory. The experimental validation shows that our proposed framework outperforms all recent methods on standard datasets. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we introduce, evaluate and make available the American Sign Language letters (ASL-DVS), as well as human action dataset (UCF101-DVS, HMDB51-DVS and ASLAN-DVS).

[1] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[2] Tobi Delbrück,et al. The event-camera dataset and simulator: Event-based data for pose estimation, visual odometry, and SLAM , 2016, Int. J. Robotics Res..

[3] Chiara Bartolozzi,et al. Fast Event-based Corner Detection , 2017, BMVC.

[4] Ryad Benosman,et al. Asynchronous event-based corner detection and matching , 2015, Neural Networks.

[5] Ralph Etienne-Cummings,et al. Bioinspired Visual Motion Estimation , 2014, Proceedings of the IEEE.

[6] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[7] Sébastien Barré,et al. A Motion-Based Feature for Event-Based Pattern Recognition , 2017, Front. Neurosci..

[8] Steve B. Furber,et al. Scalable energy-efficient, low-latency implementations of trained spiking Deep Belief Networks on SpiNNaker , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[9] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[10] Nitish V. Thakor,et al. HFirst: A Temporal Approach to Object Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Hong Yang,et al. DART: Distribution Aware Retinal Transform for Event-Based Cameras , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13] Tobias Brosch,et al. On event-based optical flow detection , 2015, Front. Neurosci..

[14] T. Suk,et al. Point Data Reduction Using 3D Grids , 2001 .

[15] Hongjie Liu,et al. DVS Benchmark Datasets for Object Tracking, Action Recognition, and Object Recognition , 2016, Front. Neurosci..

[16] Yiannis Andreopoulos,et al. Neuromorphic Vision Sensing for CNN-based Action Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Tobi Delbrück,et al. A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[18] Garrick Orchard,et al. HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Yiannis Andreopoulos,et al. PIX2NVS: Parameterized conversion of pixel-domain video frames to neuromorphic vision streams , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[20] Kostas Daniilidis,et al. EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras , 2018, Robotics: Science and Systems.

[21] Chiara Bartolozzi,et al. Event-Based Visual Flow , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[22] Xiao-Ming Wu,et al. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[23] Shih-Chii Liu,et al. Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[24] Tobi Delbruck,et al. Real-time classification and sensor fusion with a spiking deep belief network , 2013, Front. Neurosci..

[25] Joan Bruna,et al. Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[26] Bernard Brezzo,et al. TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28] Qi Wang,et al. Memory-Augmented Temporal Dynamic Learning for Action Recognition , 2019, AAAI.

[29] Qi Wang,et al. Action recognition using spatial-optical data organization and sequential learning framework , 2018, Neurocomputing.

[30] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[31] Giacomo Indiveri,et al. Neuromorphic Engineering , 2015, Handbook of Computational Intelligence.

[32] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[33] Gregory Cohen,et al. Event-Based Feature Detection, Recognition and Classification , 2016 .

[34] Xavier Bresson,et al. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[35] Yiannis Aloimonos,et al. Bio-inspired Motion Estimation with Event-Driven Sensors , 2015, IWANN.

[36] Tobi Delbrück,et al. Training Deep Spiking Neural Networks Using Backpropagation , 2016, Front. Neurosci..

[37] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[38] Tal Hassner,et al. The Action Similarity Labeling Challenge , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Patrick Camilleri,et al. pyDVS: An extensible, real-time Dynamic Vision Sensor emulator using off-the-shelf hardware , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[40] Yiannis Aloimonos,et al. Contour Motion Estimation for Asynchronous Event-Driven Cameras , 2014, Proceedings of the IEEE.

[41] Matthew Cook,et al. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[42] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43] Tobi Delbrück,et al. A 5 Meps $100 USB2.0 Address-Event Monitor-Sequencer Interface , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[44] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] Matteo Matteucci,et al. Event-based Convolutional Networks for Object Detection in Neuromorphic Cameras , 2018, ArXiv.

[46] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[47] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[48] Nitish V. Thakor,et al. Spatiotemporal Filtering for Event-Based Action Recognition , 2019, ArXiv.

[49] Matteo Matteucci,et al. Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[50] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Yiannis Andreopoulos,et al. Graph-Based Object Classification for Neuromorphic Vision Sensing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.

[54] Rong-Hong Jan,et al. The r-Neighborhood Graph: An Adjustable Structure for Topology Control in Wireless Ad Hoc Networks , 2007, IEEE Transactions on Parallel and Distributed Systems.

[55] Chiara Bartolozzi,et al. Fast event-based Harris corner detection exploiting the advantages of event-driven cameras , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[56] Gregory Cohen,et al. Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades , 2015, Front. Neurosci..

[57] Tao Mei,et al. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58] Junsong Yuan,et al. Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[59] Daniel Matolin,et al. A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS , 2011, IEEE Journal of Solid-State Circuits.

[60] Heinrich Müller,et al. SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61] Bernabé Linares-Barranco,et al. Poker-DVS and MNIST-DVS. Their History, How They Were Made, and Other Details , 2015, Front. Neurosci..

[62] Luping Shi,et al. CIFAR10-DVS: An Event-Stream Dataset for Object Classification , 2017, Front. Neurosci..

[63] Tobi Delbrück,et al. A Low Power, Fully Event-Based Gesture Recognition System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Eugenio Culurciello,et al. Activity-driven, event-based vision sensors , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[65] T. Delbruck,et al. > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .

[66] Matthew Cook,et al. Unsupervised learning of digit recognition using spike-timing-dependent plasticity , 2015, Front. Comput. Neurosci..

[67] Matteo Matteucci,et al. Attention Mechanisms for Object Recognition With Event-Based Cameras , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[68] Pascal Frossard,et al. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[69] Ryad Benosman,et al. HATS: Histograms of Averaged Time Surfaces for Robust Event-Based Object Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70] Pierre Vandergheynst,et al. Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[71] Jonathan Masci,et al. Learning shape correspondence with anisotropic convolutional neural networks , 2016, NIPS.

[72] Jonathan Masci,et al. Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[74] Bernabé Linares-Barranco,et al. Feedforward Categorization on AER Motion Events Using Cortex-Like Features in a Spiking Neural Network , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[75] Damien Querlioz,et al. Extraction of temporally correlated features from dynamic vision sensors with spike-timing-dependent plasticity , 2012, Neural Networks.

[76] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77] Tobi Delbrück,et al. Neuromorophic vision sensing and processing , 2016, ESSCIRC.

[78] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[79] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[80] Bo Zhao,et al. Bag of Events: An Efficient Probability-Based Feature Extraction Method for AER Image Sensors , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[81] José M. F. Moura,et al. Discrete Signal Processing on Graphs , 2012, IEEE Transactions on Signal Processing.

[82] Bernabé Linares-Barranco,et al. Mapping from Frame-Driven to Frame-Free Event-Driven Vision Systems by Low-Rate Rate Coding and Coincidence Processing--Application to Feedforward ConvNets , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[85] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[86] Garrick Orchard,et al. Benchmarking neuromorphic vision: lessons learnt from computer vision , 2015, Front. Neurosci..

[87] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88] Pierre Vandergheynst,et al. Geodesic Convolutional Neural Networks on Riemannian Manifolds , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[89] Les A. Piegl,et al. The NURBS Book , 1995, Monographs in Visual Communication.

[90] Tobi Delbruckl. Neuromorophic vision sensing and processing , 2016 .

[91] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[92] Kaleem Siddiqi,et al. Local Spectral Graph Convolution for Point Set Feature Learning , 2018, ECCV.

[93] Gert Cauwenberghs,et al. Event-driven contrastive divergence for spiking neuromorphic systems , 2013, Front. Neurosci..