SL-Animals-DVS: event-driven sign language animals dataset

Non-intrusive visual-based applications supporting the communication of people employing sign language for communication are always an open and attractive research field for the human action recognition community. Automatic sign language interpretation is a complex visual recognition task where motion across time distinguishes the sign being performed. In recent years, the development of robust and successful deep-learning techniques has been accompanied by the creation of a large number of databases. The availability of challenging datasets of Sign Language (SL) terms and phrases helps to push the research to develop new algorithms and methods to tackle their automatic recognition. This paper presents ‘SL-Animals-DVS’, an event-based action dataset captured by a Dynamic Vision Sensor (DVS). The DVS records non-fluent signers performing a small set of isolated words derived from SL signs of various animals as a continuous spike flow at very low latency. This is especially suited for SL signs which are usually made at very high speeds. We benchmark the recognition performance on this data using three state-of-the-art Spiking Neural Networks (SNN) recognition systems. SNNs are naturally compatible to make use of the temporal information that is provided by the DVS where the information is encoded in the spike times. The dataset has about 1100 samples of 59 subjects performing 19 sign language signs in isolation at different scenarios, providing a challenging evaluation platform for this emerging technology.

[1]  Xinbo Chen,et al.  Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on CPUs and GPUs , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[2]  Gert Cauwenberghs,et al.  Neuromorphic architectures with electronic synapses , 2016, 2016 17th International Symposium on Quality Electronic Design (ISQED).

[3]  Ryad Benosman,et al.  Event-Based Gesture Recognition With Dynamic Background Suppression Using Smartphone Computational Capabilities , 2018, Frontiers in Neuroscience.

[4]  Bernabé Linares-Barranco,et al.  Mapping from Frame-Driven to Frame-Free Event-Driven Vision Systems by Low-Rate Rate Coding and Coincidence Processing--Application to Feedforward ConvNets , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tobi Delbrück,et al.  A Low Power, Fully Event-Based Gesture Recognition System , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Karen Emmorey,et al.  ASL-LEX: A lexical database of American Sign Language , 2017, Behavior research methods.

[7]  Bernabé Linares-Barranco,et al.  Introduction and Analysis of an Event-Based Sign Language Dataset , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[8]  David Menotti,et al.  A multimodal LIBRAS-UFOP Brazilian sign language dataset of minimal pairs using a microsoft Kinect sensor , 2020, Expert Syst. Appl..

[9]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[10]  Mark McLeister Worship, Technology and Identity: A Deaf Protestant Congregation in Urban China , 2019, Studies in World Christianity.

[11]  Wofgang Maas,et al.  Networks of spiking neurons: the third generation of neural network models , 1997 .

[12]  Zhi-jie Liang,et al.  3D Convolutional Neural Networks for Dynamic Sign Language Recognition , 2018, Comput. J..

[13]  Bernabe Linares-Barranco,et al.  Asynchronous Spiking Neurons, the Natural Key to Exploit Temporal Sparsity , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[14]  Thales Vieira,et al.  Evaluating Deep Models for Dynamic Brazilian Sign Language Recognition , 2018, CIARP.

[15]  Xilin Chen,et al.  Isolated Sign Language Recognition with Grassmann Covariance Matrices , 2016, TACC.

[16]  Sergio Escalera,et al.  ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Bernabé Linares-Barranco,et al.  A 128$\,\times$ 128 1.5% Contrast Sensitivity 0.9% FPN 3 µs Latency 4 mW Asynchronous Frame-Free Dynamic Vision Sensor Using Transimpedance Preamplifiers , 2013, IEEE Journal of Solid-State Circuits.

[18]  Junsong Yuan,et al.  Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  T. Serrano-Gotarredona,et al.  Neuromorphic Spiking Neural Networks and Their Memristor-CMOS Hardware Implementations , 2019, Materials.

[20]  Zaid Omar,et al.  A review of hand gesture and sign language recognition techniques , 2017, International Journal of Machine Learning and Cybernetics.

[21]  Jörg Conradt,et al.  FLGR: Fixed Length Gists Representation Learning for RNN-HMM Hybrid-Based Neuromorphic Continuous Gesture Recognition , 2019, Front. Neurosci..

[22]  Massimo A. Sivilotti,et al.  Wiring considerations in analog VLSI systems, with application to field-programmable networks , 1992 .

[23]  Lei Deng,et al.  Spatio-Temporal Backpropagation for Training High-Performance Spiking Neural Networks , 2017, Front. Neurosci..

[24]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[25]  Shagan Sah,et al.  Large Scale Sign Language Interpretation , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[26]  K. Emmorey,et al.  Lexical Recognition in Sign Language: Effects of Phonetic Structure and Morphology , 1990, Perceptual and motor skills.

[27]  Daniel Matolin,et al.  A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS , 2011, IEEE Journal of Solid-State Circuits.

[28]  J. Yang,et al.  Emerging Memory Devices for Neuromorphic Computing , 2019, Advanced Materials Technologies.

[29]  Garrick Orchard,et al.  SLAYER: Spike Layer Error Reassignment in Time , 2018, NeurIPS.

[30]  Sergio Escalera,et al.  ChaLearn Looking at People Challenge 2014: Dataset and Results , 2014, ECCV Workshops.

[31]  Wulfram Gerstner,et al.  SPIKING NEURON MODELS Single Neurons , Populations , Plasticity , 2002 .

[32]  Hermann Ney,et al.  Extensions of the Sign Language Recognition and Translation Corpus RWTH-PHOENIX-Weather , 2014, LREC.

[33]  Gora Chand Nandi,et al.  An efficient gesture based humanoid learning using wavelet descriptor and MFCC techniques , 2017, Int. J. Mach. Learn. Cybern..

[34]  Jacques Kaiser,et al.  Synaptic Plasticity Dynamics for Deep Continuous Local Learning (DECOLLE) , 2018, Frontiers in Neuroscience.

[35]  Xiaochao Dang,et al.  A Delay Learning Algorithm Based on Spike Train Kernels for Spiking Neurons , 2019, Front. Neurosci..

[36]  Hermann Ney,et al.  Benchmark Databases for Video-Based Automatic Sign Language Recognition , 2008, LREC.

[37]  Avinash C. Kak,et al.  Proceedings of IEEE International Conference on Multimodel Interfaces, 2002 , 2022 .

[38]  Thomas Troelsgård,et al.  An electronic dictionary of Danish Sign Language , 2008 .

[39]  Tobi Delbrück,et al.  Live demonstration: Convolutional neural network driven by dynamic vision sensor playing RoShamBo , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).