A systematic review on content-based video retrieval

Abstract Content-based video retrieval and indexing have been associated with intelligent methods in many applications such as education, medicine and agriculture. However, an extensive and replicable review of the recent literature is missing. Moreover, relevant topics that can support video retrieval, such as dimensionality reduction, have not been surveyed. This work designs and conducts a systematic review to find papers able to answer the following research question: “what segmentation, feature extraction, dimensionality reduction and machine learning approaches have been applied for content-based video indexing and retrieval?”. By applying a research protocol proposed by us, 153 papers published from 2011 to 2018 were selected. As a result, it was found that strategies for cut-based segmentation, color-based indexing, k-means based dimensionality reduction and data clustering have been the most frequent choices in recent papers. All the information extracted from these papers can be found in a publicly available spreadsheet. This work also indicates additional findings and future research directions.

[1]  Heiko Schuldt,et al.  Enhancing sketch-based sport video retrieval by suggesting relevant motion paths , 2014, SIGIR.

[2]  Heiko Schuldt,et al.  Cineast: A Multi-feature Sketch-Based Video Retrieval Engine , 2014, 2014 IEEE International Symposium on Multimedia.

[3]  R. Baskaran,et al.  A Content Based Video Retrieval Analysis System with Extensive Features by Using Kullback-Leibler , 2014, Int. J. Comput. Intell. Syst..

[4]  G. Vigneshwari,et al.  Optimized searching of video based on speech and video text content , 2015, 2015 International Conference on Soft-Computing and Networks Security (ICSNS).

[5]  Klaus Schöffmann,et al.  When content-based video retrieval and human computation unite: Towards effective collaborative video search , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[6]  Yumeng Liu,et al.  Research on Feature Dimensionality Reduction in Content Based Public Cultural Video Retrieval , 2018, 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS).

[7]  Mathieu Lamard,et al.  Automated surgical step recognition in normalized cataract surgery videos , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  S LewMichael,et al.  Deep learning for visual understanding , 2016 .

[9]  Grigorios Tsoumakas,et al.  A systematic review of multi-label feature selection and a new method based on label construction , 2016, Neurocomputing.

[10]  Bernd Freisleben,et al.  Content-Based Video Retrieval in Historical Collections of the German Broadcasting Archive , 2016, TPDL.

[11]  Yi Yang,et al.  Complex Event Detection by Identifying Reliable Shots from Untrimmed Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[13]  Xin Guo,et al.  An improved system for concept-based video retrieval , 2012, 2012 3rd IEEE International Conference on Network Infrastructure and Digital Content.

[14]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2007 .

[15]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[16]  Yi Yang,et al.  Fast and Accurate Content-based Semantic Search in 100M Internet Videos , 2015, ACM Multimedia.

[17]  Xiao-Yong Wei,et al.  Coaching the Exploration and Exploitation in Active Learning for Interactive Video Retrieval , 2013, IEEE Transactions on Image Processing.

[18]  Birger Kollmeier,et al.  Comparing human and automatic speech recognition in simple and complex acoustic scenes , 2018, Comput. Speech Lang..

[19]  Maarten de Rijke,et al.  Content-Based Analysis Improves Audiovisual Archive Retrieval , 2012, IEEE Transactions on Multimedia.

[20]  Sankirti Shiravale,et al.  Entropy Supported Video Indexing for Content based Video Retrieval , 2013 .

[21]  Klaus Schöffmann,et al.  Relevance Segmentation of Laparoscopic Videos , 2013, 2013 IEEE International Symposium on Multimedia.

[22]  Zi Huang,et al.  Near-duplicate video retrieval: Current research and future trends , 2013, CSUR.

[23]  Nicholas Ayache,et al.  A smart atlas for endomicroscopy using automated video retrieval , 2011, Medical Image Anal..

[24]  R. Priya,et al.  A comprehensive review of significant researches on content based indexing and retrieval of visual information , 2013, Frontiers of Computer Science.

[25]  Bo Zhang,et al.  A Formal Study of Shot Boundary Detection , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Mathias Lux,et al.  Content-based retrieval in videos from laparoscopic surgery , 2016, SPIE Medical Imaging.

[27]  Xiaoshuai Sun,et al.  Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length , 2018, IEEE Transactions on Multimedia.

[28]  Meng Wang,et al.  Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing , 2016, ACM Multimedia.

[29]  Lilly Suriani Affendey,et al.  Surveillance Video Retrieval Using Effective Matching Techniques , 2018, 2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP).

[30]  James A. Thom,et al.  A Compressed-domain Robust Descriptor for Near Duplicate Video Copy Detection , 2014, IVCNZ '14.

[31]  Shu-Ching Chen,et al.  Multimedia Big Data Analytics , 2018, ACM Comput. Surv..

[32]  Christoph Meinel,et al.  Content Based Lecture Video Retrieval Using Speech and Video Text Information , 2014, IEEE Transactions on Learning Technologies.

[33]  Sudeep D. Thepade,et al.  Novel Efficient Content Based Video Retrieval Method Using Cosine-Haar Hybrid Wavelet Transform with Energy Compaction , 2015, 2015 International Conference on Computing Communication Control and Automation.

[34]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[35]  Koichi Shinoda,et al.  Semantic Indexing for Large-Scale Video Retrieval , 2016 .

[36]  Amina Serir,et al.  Weber Binarized Statistical Image Features (WBSIF) based video copy detection , 2016, J. Vis. Commun. Image Represent..

[37]  Ling Shao,et al.  Action retrieval with relevance feedback on YouTube videos , 2011, ICIMCS '11.

[38]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[39]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[40]  Meng Wang,et al.  Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder , 2018, IEEE Transactions on Image Processing.

[41]  Bogdan Ionescu,et al.  A relevance feedback approach to video genre retrieval , 2011, 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing.

[42]  Tien Tan Ngo,et al.  A novel content Based Scene Retrieval using multi-frame features , 2014, 2014 International Conference on Advanced Technologies for Communications (ATC 2014).

[43]  Haomin Cui,et al.  A novel multi-metric scheme using dynamic time warping for similarity video clip search , 2013, 2013 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC 2013).

[44]  Alan F. Smeaton Techniques used and open challenges to the analysis, indexing and retrieval of digital video , 2007, Inf. Syst..

[45]  Gwénolé Quellec,et al.  A Polynomial Model of Surgical Gestures for Real-Time Retrieval of Surgery Videos , 2012, MCBR-CDS.

[46]  A. Govardhan,et al.  Action model prediction and analysis for CBMR application , 2018, 2018 Second International Conference on Computing Methodologies and Communication (ICCMC).

[47]  Fionn Murtagh,et al.  Handbook of Cluster Analysis , 2015 .

[48]  Klaus Schöffmann,et al.  Binary convolutional neural network features off-the-shelf for image to video linking in endoscopic multimedia databases , 2018, Multimedia Tools and Applications.

[49]  Yi Yang,et al.  Content-Based Video Search over 1 Million Videos with 1 Core in 1 Second , 2015, ICMR.

[50]  Mohsen Ramezani,et al.  Retrieving Human Action by Fusing the Motion Information of Interest Points , 2018, Int. J. Artif. Intell. Tools.

[51]  Tieniu Tan,et al.  Discovering compact topical descriptors for web video retrieval , 2013, 2013 IEEE International Conference on Image Processing.

[52]  Dan Schonfeld,et al.  Video Skimming and Summarization Based on Principal Component Analysis , 2001, MMNS.

[53]  Lexing Xie,et al.  Scalable Mobile Video Retrieval with Sparse Projection Learning and Pseudo Label Mining , 2013, IEEE MultiMedia.

[54]  Koichi Shinoda,et al.  [Invited Paper] Semantic Indexing for Large-Scale Video Retrieval , 2016 .

[55]  Matthieu Cord,et al.  Visual Indexing and Retrieval , 2012, SpringerBriefs in Computer Science.

[56]  Lei Wang,et al.  Video Retrieval Based on Words-of-Interest Selection , 2011, ECIR.

[57]  Noel Murphy,et al.  Automatic TV advertisement detection from MPEG bitstream , 2002, Pattern Recognit..

[58]  Anoop M. Namboodiri,et al.  A Sketch-Based Approach To Video Retrieval Using Qualitative Features , 2014, ICVGIP.

[59]  Parag Kulkarni,et al.  An effective content based video analysis and retrieval using pattern indexing techniques , 2015, 2015 International Conference on Industrial Instrumentation and Control (ICIC).

[60]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[61]  Ling Shao,et al.  Active learning for human action retrieval using query pool selection , 2014, Neurocomputing.

[62]  Klaus Schöffmann,et al.  Evaluation of Visual Content Descriptors for Supporting Ad-Hoc Video Search Tasks at the Video Browser Showdown , 2018, MMM.

[63]  Deepu Rajan,et al.  Multi-modal Solution for Unconstrained News Story Retrieval , 2012, MMM.

[64]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[65]  David Dagan Feng,et al.  Spatial-temporal correlation for trajectory based action video retrieval , 2015, 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP).

[66]  Jurandy Almeida,et al.  Unsupervised similarity learning through Cartesian product of ranking references , 2018, Pattern Recognit. Lett..

[67]  Yun Fu,et al.  Videography-Based Unconstrained Video Analysis , 2017, IEEE Transactions on Image Processing.

[68]  Willem Jonker,et al.  Content-Based Video Retrieval - A Database Perspective , 2003, Multimedia systems and applications.

[69]  Kunio Kashino,et al.  BM25 With Exponential IDF for Instance Search , 2014, IEEE Transactions on Multimedia.

[70]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[71]  Mariko Nakano-Miyatake,et al.  Content Based Video Retrival System for Mexican Culture Heritage Based on Object Matching and Local-Global Descriptors , 2014, 2014 International Conference on Mechatronics, Electronics and Automotive Engineering.

[72]  Gwénolé Quellec,et al.  Real-time analysis of cataract surgery videos using statistical models , 2017, Multimedia Tools and Applications.

[73]  Ling Shao,et al.  Content-based retrieval of human actions from realistic video databases , 2013, Inf. Sci..

[74]  Qinghua Zheng,et al.  Adaptive Unsupervised Feature Selection With Structure Regularization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[75]  François Chollet,et al.  Deep Learning with R , 2018 .

[76]  Bin Liang,et al.  Design of Video Retrieval System Using MPEG-7 Descriptors , 2012 .

[77]  Won Jong Jeon,et al.  A spatio-temporal pyramid matching for video retrieval , 2013, Comput. Vis. Image Underst..

[78]  Yu Cao,et al.  A framework for parsing colonoscopy videos for semantic units , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[79]  Mohsen Ramezani,et al.  Motion pattern based representation for improving human action retrieval , 2018, Multimedia Tools and Applications.

[80]  Ling Shao,et al.  Relevance feedback for real-world human action retrieval , 2012, Pattern Recognit. Lett..

[81]  Hao Chen,et al.  Video Copy Detection Based On Temporal Contextual Hashing , 2016, 2016 IEEE Second International Conference on Multimedia Big Data (BigMM).

[82]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[83]  Yu Cao,et al.  A Visual Model Approach for Parsing Colonoscopy Videos , 2004, CIVR.

[84]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[85]  Nhut M. Pham,et al.  Vietnamese Multimedia Agricultural Information Retrieval System as an Info Service , 2015, WLSI.

[86]  Adriano M. Pereira,et al.  SAPTE: A multimedia information system to support the discourse analysis and information retrieval of television programs , 2014, Multimedia Tools and Applications.

[87]  Ioannis Patras,et al.  Query and Keyframe Representations for Ad-hoc Video Search , 2017, ICMR.

[88]  Gwénolé Quellec,et al.  Real-Time Segmentation and Recognition of Surgical Tasks in Cataract Surgery Videos , 2014, IEEE Transactions on Medical Imaging.

[89]  Klaus Schöffmann,et al.  Video retrieval in laparoscopic video recordings with dynamic content descriptors , 2017, Multimedia Tools and Applications.

[90]  Kebin Jia,et al.  Semantic Similarity Based Video Reranking , 2015, 2015 International Conference on Computational Intelligence and Communication Networks (CICN).

[91]  Nicholas Ayache,et al.  Semi-automated Query Construction for Content-Based Endomicroscopy Video Retrieval , 2014, MICCAI.

[92]  Chengjun Liu,et al.  Learning and Recognition Methods for Image Search and Video Retrieval , 2017 .

[93]  Jing Xiao,et al.  Content-Based Video Indexing and Retrieval , 2004 .

[94]  C. Ranjith Kumar,et al.  Star: Semi-supervised-Clustering Technique with Application for Retrieval of Video , 2014, 2014 International Conference on Intelligent Computing Applications.

[95]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[96]  Chung-Lin Huang,et al.  A robust scene-change detection method for video segmentation , 2001, IEEE Trans. Circuits Syst. Video Technol..

[97]  Gwénolé Quellec,et al.  Real-time retrieval of similar videos with application to computer-aided retinal surgery , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[98]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[99]  Nicholas Ayache,et al.  Learning Semantic and Visual Similarity for Endomicroscopy Video Retrieval , 2012, IEEE Transactions on Medical Imaging.

[100]  P. Ushapreethi,et al.  Survey on Video Big Data: Analysis Methods and Applications , 2017 .

[101]  Bianca Zadrozny,et al.  Categorizing feature selection methods for multi-label classification , 2016, Artificial Intelligence Review.

[102]  Ling Shao,et al.  Efficient Search and Localization of Human Actions in Video Databases , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[103]  Shashi Kant Activity-based exploitation of Full Motion Video (FMV) , 2012, Defense + Commercial Sensing.

[104]  Mohan S. Kankanhalli,et al.  LSTM-based multi-label video event detection , 2017, Multimedia Tools and Applications.

[105]  Jalel Akaichi,et al.  A medical image retrieval scheme with relevance feedback through a medical social network , 2016, Social Network Analysis and Mining.

[106]  Roberto Raieli Multimedia information retrieval : theory and techniques , 2013 .

[107]  Christian R. Shelton,et al.  Event Detection in Continuous Video: An Inference in Point Process Approach , 2017, IEEE Transactions on Image Processing.

[108]  Ram Nevatia,et al.  Tag-based video retrieval by embedding semantic content in a continuous word space , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[109]  Siddhartha Bhattacharyya,et al.  Hybrid soft computing approaches to content based video retrieval: A brief review , 2016, Appl. Soft Comput..

[110]  Xiaonan Luo,et al.  A new visual navigation system for exploring biomedical Open Educational Resource (OER) videos , 2016, J. Am. Medical Informatics Assoc..

[111]  Henning Müller,et al.  Retrieval From and Understanding of Large-Scale Multi-modal Medical Datasets: A Review , 2017, IEEE Transactions on Multimedia.

[112]  Yun Rui,et al.  A QoE centric distributed caching approach for vehicular video streaming in cellular networks , 2016, Wirel. Commun. Mob. Comput..

[113]  M. Iqbal Saripan,et al.  Methods and Challenges in Shot Boundary Detection: A Review , 2018, Entropy.

[114]  Pierre Jannin,et al.  Automatic data-driven real-time segmentation and recognition of surgical workflow , 2016, International Journal of Computer Assisted Radiology and Surgery.

[115]  Mahmood Fathy,et al.  Hierarchical Key-Frame Based Video Shot Clustering Using Generalized Trace Kernel , 2011 .

[116]  Jianhua Ma,et al.  An effective and economical architecture for semantic-based heterogeneous multimedia big data retrieval , 2015, J. Syst. Softw..

[117]  Benoit Huet,et al.  When textual and visual information join forces for multimedia retrieval , 2014, ICMR.

[118]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[119]  Shulin Wang,et al.  Feature selection in machine learning: A new perspective , 2018, Neurocomputing.

[120]  J. Gitanjali,et al.  A Video Mining Application for Image Retrieval , 2011 .

[121]  Karn Patanukhom,et al.  Key frame extraction for text based video retrieval using Maximally Stable Extremal Regions , 2015, 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom).

[122]  S. Suguna,et al.  Visual Semantic Based 3D Video Retrieval System Using HDFS , 2016, KSII Trans. Internet Inf. Syst..

[123]  Ramesh C. Jain,et al.  A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video , 2002, Pattern Recognit..

[124]  Christian Breiteneder,et al.  Retrieval of visual composition in film , 2011, WIAMIS 2011.

[125]  Fei-Fei Li,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, CVPR.

[126]  Klaus Schöffmann,et al.  Segmentation of recorded endoscopic videos by detecting significant motion changes , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[127]  BhaumikHrishikesh,et al.  Hybrid soft computing approaches to content based video retrieval , 2016 .

[128]  Heng Tao Shen,et al.  Video Captioning With Attention-Based LSTM and Semantic Consistency , 2017, IEEE Transactions on Multimedia.

[129]  Zi Huang,et al.  Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.

[130]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[131]  Haojie Li,et al.  Compact CNN Based Video Representation for Efficient Video Copy Detection , 2017, MMM.

[132]  Chandra Mohan Ranjith Kumar,et al.  A Powerful and Lightweight 3D Video Retrieval Using 3D Images Over Hadoop MapReduce , 2018 .

[133]  M. Sreeraj,et al.  Content Based Video Retrieval Using SURF Descriptor , 2013, 2013 Third International Conference on Advances in Computing and Communications.

[134]  David N. Olivieri,et al.  A KPCA spatio-temporal differential geometric trajectory cloud classifier for recognizing human actions in a CBVR system , 2015, Expert Syst. Appl..

[135]  B. Bharathi,et al.  A survey paper on big data analytics , 2017, 2017 International Conference on Information Communication and Embedded Systems (ICICES).

[136]  Mohammad Rahmati,et al.  Content based video retrieval using information theory , 2013, 2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP).

[137]  Bhusare Pranali,et al.  Inhalt Based Video Recuperation System Using OCR and ASR Technologies , 2015, 2015 International Conference on Computational Intelligence and Communication Networks (CICN).

[138]  Yi Yang,et al.  A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[139]  Jun Fang,et al.  Retrieving video shots in semantic brain imaging space using manifold-ranking , 2011, 2011 18th IEEE International Conference on Image Processing.

[140]  Husrev T. Sencar,et al.  Content-Based Video Copy Detection - A Survey , 2010, Intelligent Multimedia Analysis for Security Applications.

[141]  Sadagopan Padmakala,et al.  Interactive video retrieval using semantic level features and relevant feedback , 2017, Int. Arab J. Inf. Technol..

[142]  Yixin Chen,et al.  Marlin: Taming the big streaming data in large scale video similarity search , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[143]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[144]  Gwénolé Quellec,et al.  Real-time recognition of surgical tasks in eye surgery videos , 2014, Medical Image Anal..

[145]  Tobias Schreck,et al.  Empirical evaluation of dissimilarity measures for 3D object retrieval with application to multi-feature retrieval , 2015, 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI).

[146]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[147]  Shouling Ji,et al.  Video Big Data Retrieval Over Media Cloud: A Context-Aware Online Learning Approach , 2019, IEEE Transactions on Multimedia.

[148]  Klaus Schöffmann,et al.  Keyframe extraction in endoscopic video , 2015, Multimedia Tools and Applications.

[149]  Sudipta Roy,et al.  Video shot boundary detection: A review , 2015, 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT).

[150]  Mathias Lux,et al.  Large scale content-based video retrieval with LIvRE , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[151]  Klaus Schöffmann,et al.  Content-based processing and analysis of endoscopic images and videos: A survey , 2017, Multimedia Tools and Applications.

[152]  Ullas Gargi,et al.  Performance characterization of video-shot-change detection methods , 2000, IEEE Trans. Circuits Syst. Video Technol..

[153]  Jie Yang,et al.  Unsupervised Video Hashing via Deep Neural Network , 2018, Neural Processing Letters.

[154]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[155]  Klaus Schöffmann,et al.  Large-Scale Endoscopic Image and Video Linking with Gradient-Based Signatures , 2017, 2017 IEEE Third International Conference on Multimedia Big Data (BigMM).

[156]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[157]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[158]  Roger Zimmermann,et al.  Content vs. Context , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[159]  Lilly Suriani Affendey,et al.  An integrated semantic-based approach in concept based video retrieval , 2011, Multimedia Tools and Applications.