Real-Time Target Detection and Recognition with Deep Convolutional Networks for Intelligent Visual Surveillance

Moving target detection and tracking, recognition, behaviours analysis are the key issues in the intelligent visual surveillance system (IVSS). The challenge is how to process the real-time video stream in an effective way in case that we could find the interested objects for analysis. However, the traditional video surveillance technology often does not meet the needs of real-time key frame recognition for the on-line intelligent video monitoring system. In our paper, we apply the state-of-the-art Faster R-CNN [7] that takes advantages of convolutional neural networks into our real-time target recognition system - Deep Intelligent Visual Surveillance (DIVS). The key aspects of our DIVS are consisted of four parts: (i) Getting the real-time video image from remote cameras, (ii) Processing the data with the deep learning framework caffe [23] built for Faster R-CNN, (iii) Storing the valuable data with MySQL, (iv) Data presentation on the website. Experiments based on our system validated the effectiveness, stability and accuracy of our proposed solutions.

[1]  John K. Tsotsos,et al.  50 Years of object recognition: Directions forward , 2013, Comput. Vis. Image Underst..

[2]  Andrea Vedaldi,et al.  R-CNN minus R , 2015, BMVC.

[3]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[6]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[7]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[8]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[9]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[10]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[11]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  References , 1971 .

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[16]  Dima Damen,et al.  British Machine Vision Conference (BMVC) , 2007 .

[17]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[23]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.