Intelligent Surveillance as an Edge Network Service: from Harr-Cascade, SVM to a Lightweight CNN

Edge computing efficiently extends the realm of information technology beyond the boundary defined by cloud computing paradigm. Performing computation near the source and destination, edge computing is promising to address the challenges in many delay-sensitive applications, like real-time surveillance. Leveraging the ubiquitously connected cameras and smart mobile devices, it enables video analytics at the edge. However, traditional human-objects detection and tracking approaches are still computationally too expensive to edge devices. Aiming at intelligent surveillance as an edge network service, this work explored the feasibility of two popular humanobjects detection schemes, Harr-Cascade and SVM, at the edge. Understanding the existing constraints of the algorithms, a lightweight Convolutional Neural Network (L-CNN) is proposed using the depthwise separable convolution. The proposed L-CNN considerably reduces the number of parameters without affecting the quality of the output, thus it is ideal for an edge device usage. Being trained with Single Shot Multi-box Detector (SSD) to pinpoint each human-object location, it gives coordination of bounding box around the object. We implemented and tested L-CNN on an edge device using Raspberry PI 3. The intensive experimental comparison study has validated that the proposed L-CNN is a feasible design for real-time human-object detection as an edge service. Keywords—Edge Computing, Smart Surveillance, Lightweight Convolutional Neural Network (L-CNN), Human Detection.

[1]  Don R. Hush,et al.  Wide-Area Motion Imagery , 2010, IEEE Signal Processing Magazine.

[2]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Lilly Suriani Affendey,et al.  Systematic Review and Classification on Video Surveillance Systems , 2013 .

[5]  Cisco Visual Networking Index: Forecast and Methodology 2016-2021.(2017) http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/complete-white-paper-c11-481360.html. High Efficiency Video Coding (HEVC) Algorithms and Architectures https://jvet.hhi.fraunhofer. , 2017 .

[6]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Kostas Kolomvatsos,et al.  Reinforcement Learning for Predictive Analytics in Smart Cities , 2017, Informatics.

[10]  Heitor Silvério Lopes,et al.  A study of deep convolutional auto-encoders for anomaly detection in videos , 2018, Pattern Recognit. Lett..

[11]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[12]  Fan Zhang,et al.  Deep Convolutional Neural Networks for Hyperspectral Image Classification , 2015, J. Sensors.

[13]  Rui Caseiro,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence High-speed Tracking with Kernelized Correlation Filters , 2022 .

[14]  Bernhard Rinner,et al.  Dynamic Reconfiguration in Camera Networks: A Short Survey , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Erik Blasch,et al.  Enabling Smart Urban Surveillance at The Edge , 2017, 2017 IEEE International Conference on Smart Cloud (SmartCloud).

[16]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[17]  Ching-Tang Fan,et al.  Heterogeneous Information Fusion and Visualization for a Large-Scale Intelligent Video Surveillance System , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  JeongGil Ko,et al.  Machine Learning-Based Image Classification for Wireless Camera Sensor Networks , 2016, 2016 IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA).

[20]  Alessio Del Bue,et al.  Human behavior analysis in video surveillance: A Social Signal Processing perspective , 2013, Neurocomputing.

[21]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[22]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[23]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Kaoru Hirota,et al.  A Survey of Video-Based Crowd Anomaly Detection in Dense Scenes , 2017, J. Adv. Comput. Intell. Intell. Informatics.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  Takashi Fuse,et al.  Statistical Anomaly Detection in Human Dynamics Monitoring Using a Hierarchical Dirichlet Process Hidden Markov Model , 2017, IEEE Transactions on Intelligent Transportation Systems.

[28]  Athanasios V. Vasilakos,et al.  Machine learning on big data: Opportunities and challenges , 2017, Neurocomputing.

[29]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[30]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[31]  Zhipeng Cai,et al.  Real-Time Big Data Delivery in Wireless Networks: A Case Study on Video Delivery , 2017, IEEE Transactions on Industrial Informatics.

[32]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Genshe Chen,et al.  Summary of methods in Wide-Area Motion Imagery (WAMI) , 2014, Defense + Security Symposium.

[34]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[35]  Roger Zimmermann,et al.  Dynamic Urban Surveillance Video Stream Processing Using Fog Computing , 2016, 2016 IEEE Second International Conference on Multimedia Big Data (BigMM).

[36]  Shengen Yan,et al.  Timed Dataflow: Reducing Communication Overhead for Distributed Machine Learning Systems , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).

[37]  Guangming Shi,et al.  Feature-fused SSD: fast detection for small objects , 2017, International Conference on Graphic and Image Processing.

[38]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Chin-Tser Huang,et al.  Poster Abstract: Smart Urban Surveillance Using Fog Computing , 2016, 2016 IEEE/ACM Symposium on Edge Computing (SEC).

[40]  Majid Mirmehdi,et al.  DS-KCF: a real-time tracker for RGB-D data , 2016, Journal of Real-Time Image Processing.

[41]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[42]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Simon Lucey,et al.  Why do linear SVMs trained on HOG features perform so well? , 2014, ArXiv.

[44]  Xiaogang Wang,et al.  Intelligent multi-camera video surveillance: A review , 2013, Pattern Recognit. Lett..

[45]  Tatiana Khanova,et al.  Towards lightweight convolutional neural networks for object detection , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).