论文信息 - Matrix-LSTM: a Differentiable Recurrent Surface for Asynchronous Event-Based Data

Matrix-LSTM: a Differentiable Recurrent Surface for Asynchronous Event-Based Data

Dynamic Vision Sensors (DVSs) asynchronously stream events in correspondence of pixels subject to brightness changes. Differently from classic vision devices, they produce a sparse representation of the scene. Therefore, to apply standard computer vision algorithms, events need to be integrated into a frame or event-surface. This is usually attained through hand-crafted grids that reconstruct the frame using ad-hoc heuristics. In this paper, we propose Matrix-LSTM, a grid of Long Short-Term Memory (LSTM) cells that efficiently process events and learn end-to-end task-dependent event-surfaces. Compared to existing reconstruction approaches, our learned event-surface shows good flexibility and expressiveness on optical flow estimation on the MVSEC benchmark and it improves the state-of-the-art of event-based object classification on the N-Cars dataset.

Matteo Matteucci | Andrea Romanoni | Marco Ciccone | Marco Cannici

[1] Nick Barnes,et al. CED: Color Event Camera Dataset , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2] Qingxiang Wu,et al. Edge Detection Based on Spiking Neural Network Model , 2009, ICIC.

[3] Gregory Cohen,et al. Event-Based Feature Detection, Recognition and Classification , 2016 .

[4] Junsong Yuan,et al. Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5] Takenori Obo,et al. Human gesture recognition for robot partners by spiking neural network and classification learning , 2012, The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems.

[6] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Fei-FeiLi,et al. One-Shot Learning of Object Categories , 2006 .

[8] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[9] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[11] Gregory Cohen,et al. Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades , 2015, Front. Neurosci..

[12] Davide Scaramuzza,et al. End-to-End Learning of Representations for Asynchronous Event-Based Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13] Shih-Chii Liu,et al. Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification , 2017, Front. Neurosci..

[14] Kostas Daniilidis,et al. Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Matthew Cook,et al. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[16] Tobi Delbruck,et al. A 240×180 10mW 12us latency sparse-output vision sensor for mobile applications , 2013, 2013 Symposium on VLSI Circuits.

[17] Wolfgang Maass,et al. Networks of Spiking Neurons: The Third Generation of Neural Network Models , 1996, Electron. Colloquium Comput. Complex..

[18] Andreas Geiger,et al. Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[20] Wofgang Maas,et al. Networks of spiking neurons: the third generation of neural network models , 1997 .

[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Yiannis Aloimonos,et al. Unsupervised Learning of Dense Optical Flow, Depth and Egomotion from Sparse Event Data , 2018 .

[25] T. Delbruck,et al. > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < 1 , 2022 .

[26] Tobi Delbrück,et al. Training Deep Spiking Neural Networks Using Backpropagation , 2016, Front. Neurosci..

[27] Matteo Matteucci,et al. Attention Mechanisms for Object Recognition With Event-Based Cameras , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28] Hideo Saito,et al. EventNet: Asynchronous Recursive Event Processing , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Vijay Kumar,et al. The Multivehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception , 2018, IEEE Robotics and Automation Letters.

[30] Chiara Bartolozzi,et al. Event-Based Vision: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Tobi Delbrück,et al. A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[32] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Abdelkader Benyettou,et al. Segmentation and Edge Detection Based on Spiking Neural Network Model , 2010, Neural Processing Letters.

[34] Nitish V. Thakor,et al. HFirst: A Temporal Approach to Object Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Vladlen Koltun,et al. Events-To-Video: Bringing Modern Computer Vision to Event Cameras , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[37] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Garrick Orchard,et al. HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[40] Kostas Daniilidis,et al. EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras , 2018, Robotics: Science and Systems.

[41] Shih-Chii Liu,et al. Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[42] Ryad Benosman,et al. HATS: Histograms of Averaged Time Surfaces for Robust Event-Based Object Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[44] Tobi Delbrück,et al. Retinomorphic Event-Based Vision Sensors: Bioinspired Cameras With Spiking Output , 2014, Proceedings of the IEEE.

[45] Narciso García,et al. Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[47] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[49] Matteo Matteucci,et al. Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[50] Bernabé Linares-Barranco,et al. A 128$\,\times$ 128 1.5% Contrast Sensitivity 0.9% FPN 3 µs Latency 4 mW Asynchronous Frame-Free Dynamic Vision Sensor Using Transimpedance Preamplifiers , 2013, IEEE Journal of Solid-State Circuits.

[51] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Yiannis Andreopoulos,et al. Graph-Based Object Classification for Neuromorphic Vision Sensing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).