Human object articulation for CCTV video forensics

In this paper we present a system which is focused on developing algorithms for automatic annotation/articulation of humans passing through a surveillance camera in a way useful for describing a person/criminal by a crime scene witness. Each human is articulated/annotated based on two appearance features: 1. primary colors of clothes in the head, body and legs region. 2. presence of text/logo on the clothes. The annotation occurs after a robust foreground extraction based on a modified approach to Gaussian Mixture model and detection of human from segmented foreground images. The proposed pipeline consists of a preprocessing stage where we improve color quality of images using a basic color constancy algorithm and further improve the results using a proposed post-processing method. The results show a significant improvement to the illumination of the video frames. In order to annotate color information for human clothes, we apply 3D Histogram analysis (with respect to Hue, Saturation and Value) on HSV converted image regions of human body parts along with extrema detection and thresholding to decide the dominant color of the region. In order to detect text/logo on the clothes as another feature to articulate humans, we begin with the extraction of connected components of enhanced horizontal, vertical and diagonal edges in the frames. These candidate regions are classified as text or non-text on the bases of their Local Energy based Shape Histogram (LESH) features combined with KL divergence as classification criteria. To detect humans, a novel technique has been proposed that uses a combination of Histogram of Oriented Gradients (HOG) and Contourlet transform based Local Binary Patterns (LBP) with Adaboost as classifier. Initial screening of foreground objects is performed by using HOG features. To further eliminate the false positives due to noise form background and improve results, we apply Contourlet-LBP feature extraction on the images. In the proposed method, we extract the LBP feature descriptor for Contourlet transformed high pass sub-images from vertical and diagonal directional bands. In the final stage, extracted Contourlet-LBP descriptors are applied to Adaboost for classification. The proposed frame work showed fairly fine performance when tested on a CCTV test dataset.

[1]  Cordelia Schmid,et al.  Learning Color Names from Real-World Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Chunhua Shen,et al.  Pedestrian Detection Using Center-Symmetric Local Binary Patterns , 2010, International Conference on Information Photonics.

[3]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[4]  Thomas B. Moeslund,et al.  Automatic Annotation of Humans in Surveillance Video , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[5]  Tsuhan Chen,et al.  Object color categorization in surveillance videos , 2011, 2011 18th IEEE International Conference on Image Processing.

[6]  Langis Gagnon,et al.  Key-text spotting in documentary videos using Adaboost , 2006, Electronic Imaging.

[7]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[8]  Nadia Baaziz,et al.  Contourlet Domain Feature Extraction for Image Content Authentication , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[9]  ByoungChul Ko,et al.  Human Detection Using Wavelet-Based CS-LBP and a Cascade of Random Forests , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[10]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[11]  Stuart Geman,et al.  Context and Hierarchy in a Probabilistic Image Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Muhammad Fraz,et al.  Object Colour Extraction for CCTV Video Annotation , 2013, VISAPP.

[13]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[14]  Chu-Song Chen,et al.  Fast Human Detection Using a Novel Boosted Cascading Structure With Meta Stages , 2008, IEEE Transactions on Image Processing.

[15]  Mauro Barni,et al.  Improved wavelet-based watermarking through pixel-wise masking , 2001, IEEE Trans. Image Process..

[16]  Lisa M. Brown Color Retrieval for Video Surveillance , 2008, 2008 IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance.

[17]  M. Saquib Sarfraz,et al.  Head Pose Estimation in Face Recognition Across Pose Scenarios , 2008, VISAPP.

[18]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[21]  Edward Y. Chang,et al.  Identifying Color in Motion in Video Sensors , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Erik G. Learned-Miller,et al.  Improving Recognition of Novel Input with Similarity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Raimondo Schettini,et al.  A survey on methods for colour image indexing and retrieval in image databases , 2002 .

[25]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[26]  Ma Ling-ling Text Detection in Natural Scene Images , 2013 .

[27]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..