PLSM: A Parallelized Liquid State Machine for Unintentional Action Detection

Reservoir Computing (RC) offers a viable option to deploy AI algorithms on low-end embedded system platforms. Liquid State Machine (LSM) is a bio-inspired RC model that mimics the cortical microcircuits and uses spiking neural networks (SNN) that can be directly realized on neuromorphic hardware. In this paper, we present a novel Parallelized LSM (PLSM) architecture that incorporates spatio-temporal read-out layer and semantic constraints on model output. To the best of our knowledge, such a formulation has been done for the first time in literature, and it offers a computationally lighter alternative to traditional deep-learning models. Additionally, we also present a comprehensive algorithm for the implementation of parallelizable SNNs and LSMs that are GPU-compatible. We implement the PLSM model to classify unintentional/accidental video clips, using the Oops dataset. From the experimental results on detecting unintentional action in video, it can be observed that our proposed model outperforms a self-supervised model and a fully supervised traditional deep learning model. All the implemented codes can be found at our repository https://github.com/anonymoussentience2020/Parallelized LSM for Unintentional Action Recognition.

[1]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[2]  Boyuan Chen,et al.  Oops! Predicting Unintentional Action in Video , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[4]  Herbert Jaeger,et al.  Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..

[5]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[6]  Joo-Hwee Lim,et al.  Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Steve B. Furber,et al.  The SpiNNaker Project , 2014, Proceedings of the IEEE.

[8]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Benjamin Schrauwen,et al.  Phoneme Recognition with Large Hierarchical Reservoirs , 2010, NIPS.

[11]  Deva Ramanan,et al.  Parsing Videos of Actions with Segmental Grammars , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Eric L. Schwartz,et al.  Computing with the Leaky Integrate-and-Fire Neuron: Logarithmic Computation and Multiplication , 1997, Neural Computation.

[13]  Christopher Kanan,et al.  Convolutional Drift Networks for Video Classification , 2017, 2017 IEEE International Conference on Rebooting Computing (ICRC).

[14]  Eugene M. Izhikevich,et al.  Polychronization: Computation with Spikes , 2006, Neural Computation.

[15]  Ian D. Reid,et al.  A general method for human activity recognition in video , 2006, Comput. Vis. Image Underst..

[16]  Nicholas Soures,et al.  Deep Liquid State Machines With Neural Plasticity for Video Activity Recognition , 2019, Front. Neurosci..

[17]  Gian Luca Foresti,et al.  Object recognition and tracking for remote video surveillance , 1999, IEEE Trans. Circuits Syst. Video Technol..

[18]  Jian-Xin Xu,et al.  Effects of synaptic connectivity on liquid state machine performance , 2013, Neural Networks.

[19]  Wolfgang Maass,et al.  Liquid State Machines: Motivation, Theory, and Applications , 2010 .

[20]  Gopalakrishnan Srinivasan,et al.  Reinforcement Learning With Low-Complexity Liquid State Machines , 2019, Front. Neurosci..

[21]  Johannes Schemmel,et al.  Implementing Synaptic Plasticity in a VLSI Spiking Neural Network Model , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[22]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[23]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[24]  Sung Wook Baik,et al.  Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features , 2018, IEEE Access.

[25]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Qian Wang,et al.  D-LSM: Deep Liquid State Machine with unsupervised recurrent reservoir tuning , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).