Known-Item Search in Image Datasets Using Automatically Detected Keywords

Known-item search represents a scenario, where a user searches for one particular image in a given collection but does not know where it is located. The thesis focuses on the design and evaluation of a keyword retrieval model for known-item search in image collections. We use a deep neural network trained on a custom dataset to annotate the images. We design complex yet easy-to-use query interface for fast image retrieval. We use/design several types of artificial users to estimate the model’s performance in an interactive setting. We also discuss our successful participation at two international competitions.

[1]  Yiannis Kompatsiaris,et al.  VERGE in VBS 2018 , 2018, MMM.

[2]  George Awad,et al.  On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017 , 2018, IEEE Transactions on Multimedia.

[3]  Gregor Kovalčík Comparison of signature-based and semantic similarity models , 2017 .

[4]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[6]  Kai Uwe Barthel,et al.  Fusing Keyword Search and Visual Exploration for Untagged Videos , 2018, MMM.

[7]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jakub Lokoc,et al.  Using an Interactive Video Retrieval Tool for LifeLog Data , 2018, LSC@ICMR.

[9]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Chong-Wah Ngo,et al.  Enhanced VIREO KIS at VBS 2018 , 2018, MMM.

[13]  Rui Hu,et al.  Motion-sketch Based Video Retrieval Using a Trellis Levenshtein Distance , 2010, 2010 20th International Conference on Pattern Recognition.

[14]  Cathal Gurrin,et al.  Virtual Reality Lifelog Explorer: Lifelog Search Challenge at ACM ICMR 2018 , 2018, LSC@ICMR.

[15]  Klaus Schöffmann,et al.  3-D Interfaces to Improve the Performance of Visual Known-Item Search , 2014, IEEE Transactions on Multimedia.

[16]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Jakub Lokoc,et al.  Signature-Based Video Browser , 2014, MMM.

[19]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Klaus Schöffmann,et al.  The ITEC Collaborative Video Search System at the Video Browser Showdown 2018 , 2018, MMM.

[22]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yiannis Kompatsiaris,et al.  VERGE in VBS 2017 , 2017, MMM.

[24]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[25]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[29]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30]  Klaus Schöffmann,et al.  Sketch-Based Similarity Search for Collaborative Feature Maps , 2018, MMM.

[31]  David A. Shamma,et al.  The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[32]  Jonathan G. Fiscus,et al.  TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.

[33]  Dumitru Erhan,et al.  Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Claudio Gennaro,et al.  Searching and annotating 100M Images with YFCC100M-HNfc6 and MI-File , 2017, CBMI.

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[38]  S. Dasgupta The hardness of k-means clustering , 2008 .

[39]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[40]  Adam Blažek,et al.  Efficient video retrieval using complex sketches and exploration based on semantic descriptors , 2016 .

[41]  Sanparith Marukatat,et al.  Sloth Search System , 2018, MMM.

[42]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[43]  Vinh-Tiep Nguyen,et al.  Video Search Based on Semantic Extraction and Locally Regional Object Proposal , 2018, MMM.

[44]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[45]  Jakub Lokoc,et al.  Revisiting SIRET Video Retrieval Tool , 2018, MMM.

[46]  François Fleuret,et al.  HEAT: Iterative relevance feedback with one million images , 2011, 2011 International Conference on Computer Vision.

[47]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[48]  Georges Quénot,et al.  TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking , 2017, TRECVID.

[49]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[50]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[51]  Heiko Schuldt,et al.  Competitive Video Retrieval with vitrivr , 2018, MMM.

[52]  Luca Rossetto,et al.  Interactive video search tools: a detailed analysis of the video browser showdown 2015 , 2016, Multimedia Tools and Applications.

[53]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[54]  Wolfgang Hürst,et al.  A Storyboard-Based Interface for Mobile Video Browsing , 2015, MMM.

[55]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[56]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.