Flexible image analysis for law enforcement agencies with deep neural networks to determine: where, who and what

Due to the increasing need for effective security measures and the integration of cameras in commercial products, a huge amount of visual data is created today. Law enforcement agencies (LEAs) are inspecting images and videos to find radicalization, propaganda for terrorist organizations and illegal products on darknet markets. This is time consuming. Instead of an undirected search, LEAs would like to adapt to new crimes and threats, and focus only on data from specific locations, persons or objects, which requires flexible interpretation of image content. Visual concept detection with deep convolutional neural networks (CNNs) is a crucial component to understand the image content. This paper has five contributions. The first contribution allows image-based geo-localization to estimate the origin of an image. CNNs and geotagged images are used to create a model that determines the location of an image by its pixel values. The second contribution enables analysis of fine-grained concepts to distinguish sub-categories in a generic concept. The proposed method encompasses data acquisition and cleaning and concept hierarchies. The third contribution is the recognition of person attributes (e.g., glasses or moustache) to enable query by textual description for a person. The person-attribute problem is treated as a specific sub-task of concept classification. The fourth contribution is an intuitive image annotation tool based on active learning. Active learning allows users to define novel concepts flexibly and train CNNs with minimal annotation effort. The fifth contribution increases the flexibility for LEAs in the query definition by using query expansion. Query expansion maps user queries to known and detectable concepts. Therefore, no prior knowledge of the detectable concepts is required for the users. The methods are validated on data with varying locations (popular and non-touristic locations), varying person attributes (CelebA dataset), and varying number of annotations.

[1]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[2]  Chong-Wah Ngo,et al.  Semantic Reasoning in Zero Example Video Event Retrieval , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[3]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[5]  Matthew R. Walter,et al.  Satellite image-based localization via learned embeddings , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Subhransu Maji,et al.  Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[7]  Christoph H. Lampert,et al.  Learning Intelligent Dialogs for Bounding Box Annotation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Serge J. Belongie,et al.  Learning deep representations for ground-to-aerial geolocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[12]  Klamer Schutte,et al.  Object recognition using deep convolutional neural networks with complete transfer and partial frozen layers , 2016, Security + Defence.

[13]  Lorenzo Torresani,et al.  Meta-class features for large-scale object categorization on a budget , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[15]  Klamer Schutte,et al.  Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos , 2013, Machine Vision and Applications.

[16]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[17]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Henri Bouma,et al.  Rapid Annotation Tool to Train Novel Concept Detectors with Active Learning , 2019, MMEDIA 2019.

[20]  Adrian Popescu,et al.  Large-Scale Image Mining with Flickr Groups , 2015, MMM.

[21]  Rainer Lienhart,et al.  Scalable logo recognition in real-world images , 2011, ICMR.

[22]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[23]  A. Baron Experimental Designs , 1990, The Behavior analyst.

[24]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[25]  Klamer Schutte,et al.  Knowledge based query expansion in complex multimedia event detection , 2016, Multimedia Tools and Applications.

[26]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[27]  Klamer Schutte,et al.  Incremental concept learning with few training examples and hierarchical classification , 2015, SPIE Security + Defence.

[28]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[30]  Henri Bouma,et al.  Automatic analysis of online image data for law enforcement agencies by concept detection and instance search , 2017, Security + Defence.

[31]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[32]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Klamer Schutte,et al.  Interactive detection of incrementally learned concepts in images with ranking and semantic query interpretation , 2015, 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI).

[35]  Adrian Popescu,et al.  Scale-Free Content Based Image Retrieval (or Nearly so) , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[36]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[38]  James Hays,et al.  Localizing and Orienting Street Views Using Overhead Imagery , 2016, ECCV.

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).