Cloud based Scalable Object Recognition from Video Streams using Orientation Fusion and Convolutional Neural Networks

Object recognition from live video streams comes with numerous challenges such as the variation in illumination conditions and poses. Convolutional neural networks (CNNs) have been widely used to perform intelligent visual object recognition. Yet, CNNs still suffer from severe accuracy degradation, particularly on illumination-variant datasets. To address this problem, we propose a new CNN method based on orientation fusion for visual object recognition. The proposed cloud-based video analytics system pioneers the use of bi-dimensional empirical mode decomposition to split a video frame into intrinsic mode functions (IMFs). We further propose these IMFs to endure Reisz transform to produce monogenic object components, which are in turn used for the training of CNNs. Past works have demonstrated how the object orientation component may be used to pursue accuracy levels as high as 93\%. Herein we demonstrate how a feature-fusion strategy of the orientation components leads to further improving visual recognition accuracy to 97\%. We also assess the scalability of our method, looking at both the number and the size of the video streams under scrutiny. We carry out extensive experimentation on the publicly available Yale dataset, including also a self generated video datasets, finding significant improvements (both in accuracy and scale), in comparison to AlexNet, LeNet and SE-ResNeXt, which are the three most commonly used deep learning models for visual object recognition and classification.

[1]  Edilson de Aguiar,et al.  Facial expression recognition with Convolutional Neural Networks: Coping with few data and the training sample order , 2017, Pattern Recognit..

[2]  Gerhard Rigoll,et al.  A deep convolutional neural network for video sequence background subtraction , 2018, Pattern Recognit..

[3]  Omer F. Rana,et al.  RES: Real-Time Video Stream Analytics Using Edge Enhanced Clouds , 2020, IEEE Transactions on Cloud Computing.

[4]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[5]  Richard Hill,et al.  Cloud-based scalable object detection and classification in video streams , 2018, Future Gener. Comput. Syst..

[6]  Qing Wang,et al.  Distance metric optimization driven convolutional neural network for age invariant face recognition , 2018, Pattern Recognit..

[7]  Xiaobo Lu,et al.  Face illumination recovery for the deep learning feature under severe illumination variations , 2021, Pattern Recognit..

[8]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ashiq Anjum,et al.  Spatial Frequency Based Video Stream Analysis for Object Classification and Recognition in Clouds , 2016, 2016 IEEE/ACM 3rd International Conference on Big Data Computing Applications and Technologies (BDCAT).

[10]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ashiq Anjum,et al.  Deep Learning Hyper-Parameter Optimization for Video Analytics in Clouds , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[12]  Silong Peng,et al.  Directional EMD and its application to texture segmentation , 2005, Science in China Series F: Information Sciences.

[13]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Yunsong Li,et al.  Hyperspectral image reconstruction by deep convolutional neural network for classification , 2017, Pattern Recognit..

[15]  Norden E. Huang,et al.  Ensemble Empirical Mode Decomposition: a Noise-Assisted Data Analysis Method , 2009, Adv. Data Sci. Adapt. Anal..

[16]  Richard McClatchey,et al.  CMS Workflow Execution Using Intelligent Job Scheduling and Data Access Strategies , 2011, IEEE Transactions on Nuclear Science.

[17]  Loris Nanni,et al.  Handcrafted vs. non-handcrafted features for computer vision classification , 2017, Pattern Recognit..

[18]  Xuefeng Zhu,et al.  Adaptive feature fusion for visual object tracking , 2021, Pattern Recognit..

[19]  Ashiq Anjum,et al.  Grid Enabled Analysis : Architecture, prototype and status , 2005 .

[20]  Nik Bessis,et al.  Federated broker system for pervasive context provisioning , 2013, J. Syst. Softw..

[21]  Ashiq Anjum,et al.  The Clarens Grid-enabled Web Services Framework : Services and Implementation , 2005 .

[22]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..