Simultaneous and Spatiotemporal Detection of Different Levels of Activity in Multidimensional Data

In this work, we present a novel and promising approach to autonomously detect different levels of simultaneous and spatiotemporal activity in multidimensional data. We introduce a new multilabeling technique, which assigns different labels to different regions of interest in the data, and thus, incorporates the spatial aspect. Each label is built to describe the level of activity/motion to be monitored in the spatial location that it represents, in contrast to existing approaches providing only a binary result as the presence or absence of activity. This novel Spatially and Motion-Level Descriptive (SMLD) labeling schema is combined with a Convolutional Long Short Term Memory-based network for classification to capture different levels of activity both spatially and temporally without the use of any foreground or object detection. The proposed approach can be applied to various types of spatiotemporal data captured for completely different application domains. In this paper, it was evaluated on video data as well as respiratory sound data. Metrics commonly associated with multilabeling, namely Hamming Loss and Subset Accuracy, as well as confusion matrix-based measurements are used to evaluate performance. Promising testing results are achieved with an overall Hamming Loss for video datasets close to 0.05, Subset Accuracy close to 80% and confusion matrix-based metrics above 0.9. In addition, our proposed approach’s ability in detecting frequent motion patterns based on predicted spatiotemporal activity levels is discussed. Encouraging results have been obtained on the respiratory sound dataset as well, while detecting abnormalities in different parts of the lungs. The experimental results demonstrate that the proposed approach can be applied to various types of spatiotemporal data captured for different application domains.

[1]  Heng Wang,et al.  Scenes-Objects-Actions: A Multi-task, Multi-label Video Dataset , 2018, ECCV.

[2]  William A. Hoff,et al.  Pedestrian detection in low resolution videos , 2014, IEEE Winter Conference on Applications of Computer Vision.

[3]  Andreas E. Savakis,et al.  Anomaly Detection in Video Using Predictive Convolutional Long Short-Term Memory Networks , 2016, ArXiv.

[4]  Antonio Moccia,et al.  SAR-based sea traffic monitoring: a reliable approach for maritime surveillance , 2011, Remote Sensing.

[5]  Suman Saha,et al.  Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos , 2016, BMVC.

[6]  Johannes Fürnkranz,et al.  On the Combination of Two Decompositive Multi-Label Classification Methods , 2009 .

[7]  Zhi-Hua Zhou,et al.  A Unified View of Multi-Label Performance Measures , 2016, ICML.

[8]  Francisco Charte,et al.  Multilabel Classification , 2016, Springer International Publishing.

[9]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[10]  Ohad Shamir,et al.  Multiclass-Multilabel Classification with More Classes than Examples , 2010, AISTATS.

[11]  Dimitris N. Metaxas,et al.  Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests , 2017, AAAI.

[12]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  P. Cavanagh,et al.  Flexible cognitive resources: competitive content maps for attention and memory , 2013, Trends in Cognitive Sciences.

[14]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[15]  Francisco Charte,et al.  MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation , 2015, Knowl. Based Syst..

[16]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Aggelos K. Katsaggelos,et al.  Anomalous video event detection using spatiotemporal context , 2011 .

[18]  Hazem M. Hajj,et al.  A Framework for Emotion Recognition from Human Computer Interaction in Natural Setting , 2016 .

[19]  Cordelia Schmid,et al.  Learning to Track for Spatio-Temporal Action Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[21]  Yuke Li,et al.  A Deep Spatiotemporal Perspective for Understanding Crowd Behavior , 2018, IEEE Transactions on Multimedia.

[22]  M. A. Saleem Durai,et al.  Intelligent video surveillance: a review through deep learning techniques for crowd analysis , 2019, Journal of Big Data.

[23]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[24]  Jean-Philippe Thiran,et al.  A Computer Vision System to Localize and Classify Wastes on the Streets , 2017, ICVS.

[25]  Magdalena Balazinska,et al.  Multilabel multiclass classification of OCT images augmented with age, gender and visual acuity data , 2018, bioRxiv.

[26]  Senem Velipasalar,et al.  Classification of affect using deep learning on brain blood flow data , 2019, Journal of Near Infrared Spectroscopy.

[27]  Mohan S. Kankanhalli,et al.  LSTM-based multi-label video event detection , 2017, Multimedia Tools and Applications.

[28]  Özkan Kiliç,et al.  Classification of lung sounds using convolutional neural networks , 2017, EURASIP Journal on Image and Video Processing.

[29]  Lei Chen,et al.  Object detection in surveillance video from dense trajectories , 2015, 2015 14th IAPR International Conference on Machine Vision Applications (MVA).

[30]  Peter Willett,et al.  Radar/AIS data fusion and SAR tasking for Maritime Surveillance , 2008, 2008 11th International Conference on Information Fusion.

[31]  Francisco Charte,et al.  Addressing imbalance in multilabel classification: Measures and random resampling algorithms , 2015, Neurocomputing.

[32]  P. Yakimov,et al.  CNN Design for Real-Time Traffic Sign Recognition , 2017 .

[33]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[34]  Aude Oliva,et al.  A Large Scale Multi-Label Action Dataset for Video Understanding , 2018 .

[35]  Cheng Xu,et al.  InnoHAR: A Deep Neural Network for Complex Human Activity Recognition , 2019, IEEE Access.

[36]  Linda G. Shapiro,et al.  Multi-Instance Multi-Label Learning for Multi-Class Classification of Whole Slide Breast Histopathology Images , 2018, IEEE Transactions on Medical Imaging.

[37]  Cecilia Lindig León,et al.  Multilabel classification of EEG-based combined motor imageries implemented for the 3D control of a robotic arm , 2017 .

[38]  Yi Zhu,et al.  Large-Scale Mapping of Human Activity using Geo-Tagged Videos , 2017, SIGSPATIAL/GIS.

[39]  Johannes Fürnkranz,et al.  Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification , 2017, NIPS.

[40]  Ramakant Nevatia,et al.  Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation , 2017, BMVC.

[41]  Li Fei-Fei,et al.  Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos , 2015, International Journal of Computer Vision.

[42]  Senem Velipasalar,et al.  Building predictive models of emotion with functional near-infrared spectroscopy , 2018, Int. J. Hum. Comput. Stud..

[43]  Grigorios Tsoumakas,et al.  Synthetic Oversampling of Multi-Label Data based on Local Label Distribution , 2019, ECML/PKDD.

[44]  Alberto Del Bimbo,et al.  Event detection and recognition for semantic annotation of video , 2010, Multimedia Tools and Applications.

[45]  Juhan Nam,et al.  SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification , 2018 .

[46]  Zhen Yuan,et al.  Using fNIRS to identify the brain activation and networks associated with English versus Chinese simultaneous interpreting , 2019, BiOS.

[47]  Bianca Zadrozny,et al.  Correlation analysis of performance measures for multi-label classification , 2018, Inf. Process. Manag..

[48]  Fernando Torres Medina,et al.  Learning Spatio Temporal Tactile Features with a ConvLSTM for the Direction Of Slip Detection , 2019, Sensors.

[49]  Shutao Li,et al.  Multi-label learning for concept-oriented labels of product image data , 2020, Image Vis. Comput..

[50]  Fernando De la Torre,et al.  Learning Spatial and Temporal Cues for Multi-Label Facial Action Unit Detection , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[51]  Hanqing Lu,et al.  Automatic group activity annotation for mobile videos , 2016, Multimedia Systems.