BilVideo-7: Video parsing, indexing and retrieval (BilVideo-7: Video çözümleme, indeksleme ve erişimi)

BilVideo-7: VIDEO PARSING, INDEXING AND RETRIEVAL Muhammet Bastan Ph.D. in Computer Engineering Supervisors: Assoc. Prof. Dr. Ugur Gudukbay and Prof. Dr. Ozgur Ulusoy July, 2010 Video indexing and retrieval aims to provide fast, natural and intuitive access to large video collections. This is getting more and more important as the amount of video data increases at a stunning rate. This thesis introduces the BilVideo-7 system to address the issues related to video parsing, indexing and retrieval. BilVideo-7 is a distributed and MPEG-7 compatible video indexing and retrieval system that supports complex multimodal queries in a unified framework. The video data model is based on an MPEG-7 profile which is designed to represent the videos by decomposing them into Shots, Keyframes, Still Regions and Moving Regions. The MPEG-7 compatible XML representations of videos according to this profile are obtained by the MPEG-7 compatible video feature extraction and annotation tool of BilVideo-7, and stored in a native XML database. Users can formulate text, color, texture, shape, location, motion and spatio-temporal queries on an intuitive, easy-touse visual query interface, whose composite query interface can be used to formulate very complex queries containing any type and number of video segments with their descriptors and specifying the spatio-temporal relations between them. The multithreaded query processing server parses incoming queries into subqueries and executes each subquery in a separate thread. Then, it fuses subquery results in a bottom-up manner to obtain the final query result and sends the result to the originating client. The whole system is unique in that it provides very powerful querying capabilities with a wide range of descriptors and multimodal query processing in an MPEG-7 compatible interoperable environment.

[1]  Cordelia Schmid,et al.  IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2004, Washington, DC, USA, June 27 - July 2, 2004 , 2004, CVPR Workshops.

[2]  Valerie Gouaillier,et al.  ERIC7: an experimental tool for Content-Based Image encoding and Retrieval under the MPEG-7 standard , 2004 .

[3]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[4]  Jim Melton,et al.  XML schema , 2003, SGMD.

[5]  Adnan Yazici,et al.  An intelligent fuzzy object-oriented database framework for video database applications , 2009, Fuzzy Sets Syst..

[6]  Horst M. Eidenberger,et al.  Distance measures for MPEG-7-based retrieval , 2003, MIR '03.

[7]  Adnan Yazici,et al.  An MPEG-7 Based Video Database Management System , 2004 .

[8]  Touradj Ebrahimi,et al.  MPEG-7 Description for Scalable Video Reconstruction , 2004 .

[9]  Bruce A. Draper,et al.  An Evaluation of Motion in Arti.cial Selective Attention , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[10]  Horst M. Eidenberger,et al.  How good are the visual MPEG-7 features? , 2003, Visual Communications and Image Processing.

[11]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[12]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[13]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[14]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[15]  Ehud Rivlin,et al.  Bittracker—A Bitmap Tracker for Visual Tracking under Very General Conditions , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Remco C. Veltkamp,et al.  Content-based image retrieval systems: A survey , 2000 .

[17]  Frank Nielsen,et al.  Statistical region merging , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Frank Nielsen,et al.  Semi-supervised statistical region refinement for color image segmentation , 2005, Pattern Recognit..

[19]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[21]  Özgür Ulusoy,et al.  BilVideo: Design and Implementation of a Video Database Management System , 2005, Multimedia Tools and Applications.

[22]  Paul Over,et al.  Video shot boundary detection: Seven years of TRECVid activity , 2010, Comput. Vis. Image Underst..

[23]  Václav Hlavác,et al.  Multi-class support vector machine , 2002, Object recognition supported by user interaction for service robots.

[24]  Maneesh Kumar Singh,et al.  State-of-the-art on spatio-temporal information-based video retrieval , 2009, Pattern Recognit..

[25]  LI JOHNZ. STARS: A SPATIAL ATTRIBUTES RETRIEVAL SYSTEM FOR IMAGES AND VIDEOS , 1997 .

[26]  Nikolas P. Galatsanos,et al.  Scene Detection in Videos Using Shot Clustering and Sequence Alignment , 2009, IEEE Transactions on Multimedia.

[27]  Andrew Zisserman,et al.  Efficient Visual Search for Objects in Videos , 2008, Proceedings of the IEEE.

[28]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Yang Liu,et al.  A Spatiotemporal Saliency Framework , 2006, 2006 International Conference on Image Processing.

[30]  Özgür Ulusoy,et al.  Bilvideo-7: an MPEG-7- compatible video indexing and retrieval system , 2010 .

[31]  YangMing,et al.  Context-Aware Visual Tracking , 2009 .

[32]  Gordana Pavlović-Lažetić,et al.  NATIVE XML DATABASES vs. RELATIONAL DATABASES IN DEALING WITH XML DOCUMENTS , 2006 .

[33]  Charles A. Bouman,et al.  ViBE: a compressed video database structured for active browsing and search , 2004, IEEE Transactions on Multimedia.

[34]  Joachim Köhler,et al.  IFINDER: an MPEG-7-based retrieval system for distributed multimedia content , 2002, MULTIMEDIA '02.

[35]  Yao Wang,et al.  Automatic Video Object Segmentation Using Volume Growing and Hierarchical Clustering , 2004, EURASIP J. Adv. Signal Process..

[36]  Hyeran Byun,et al.  Automatic Salient-Object Extraction Using the Contrast Map and Salient Points , 2004, PCM.

[37]  Chun-Rong Huang,et al.  Shot Change Detection via Local Keypoint Matching , 2008, IEEE Transactions on Multimedia.

[38]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[39]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Steffen Staab,et al.  M-OntoMat-Annotizer: Image Annotation Linking Ontologies and Multimedia Low-Level Features , 2006, KES.

[41]  Xiaodong Gu,et al.  An Information Theoretic Model of Spatiotemporal Visual Saliency , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[42]  George Economou,et al.  Combining graph connectivity & dominant set clustering for video summarization , 2009, Multimedia Tools and Applications.

[43]  Bo Zhang,et al.  A Formal Study of Shot Boundary Detection , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[44]  Özgür Ulusoy,et al.  A Natural Language-Based Interface for Querying a Video Database , 2007, IEEE MultiMedia.

[45]  Mathias Lux,et al.  Caliph & Emir : Semantic Annotation and Retrieval in Personal Digital Photo Libraries , 2004 .

[46]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[47]  B. S. Manjunath,et al.  Unsupervised Segmentation of Color-Texture Regions in Images and Video , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Ajay Divakaran,et al.  MPEG-7 visual motion descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[49]  Andrew Zisserman,et al.  Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[50]  Werner Bailer,et al.  Detailed audiovisual profile: enabling interoperability between MPEG-7 based systems , 2006, 2006 12th International Multi-Media Modelling Conference.

[51]  Özgür Ulusoy,et al.  Segmentation-based extraction of important objects from video for object-based indexing , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[52]  Lihi Zelnik-Manor,et al.  Context-aware saliency detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Yong Rui,et al.  MPEG-7 enhanced ubi-multimedia access —Convergence of user experience and technology , 2008, 2008 First IEEE International Conference on Ubi-Media Computing.

[54]  徐梦溪,et al.  Network video monitoring system based on OpenCV (open source computer vision library) , 2011 .

[55]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[56]  Marios C. Angelides,et al.  COSMOS-7: A Video Content Modeling Framework for MPEG-7 , 2005, 11th International Multimedia Modelling Conference.

[57]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[58]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[59]  B. S. Manjunath,et al.  EdgeFlow: a technique for boundary detection and image segmentation , 2000, IEEE Trans. Image Process..

[60]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[61]  ByoungChul Ko,et al.  Automatic Object-of-Interest segmentation from natural images , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[62]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[63]  Adnan Yazici,et al.  An Efficient Image Retrieval System Using Ordered Weighted Aggregation , 2008 .

[64]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[66]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[67]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[68]  Bohn Stafleu van Loghum,et al.  Online … , 2002, LOG IN.

[69]  Marcel Worring,et al.  The MediaMill TRECVID 2009 Semantic Video Search Engine , 2009, TRECVID.

[70]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  Nicolas Tsapatsoulis,et al.  MuLVAT: A Video Annotation Tool Based on XML-Dictionaries and Shot Clustering , 2009, ICANN.

[72]  Liang-Tien Chia,et al.  Automatic Generation of MPEG-7 Compliant XML Document for Motion Trajectory Descriptor in Sports Video , 2003, MMDB '03.

[73]  Özgür Ulusoy,et al.  An MPEG-7 Compatible Video Retrieval System with Integrated Support for Complex Multimodal Queries , 2009 .

[74]  John R. Smith,et al.  Using MPEG-7 and MPEG-21 for personalizing video , 2004, IEEE MultiMedia.

[75]  A. Murat Tekalp,et al.  Efficient Filtering and Clustering Methods for Temporal Video Segmentation and Visual Summarization , 1998, J. Vis. Commun. Image Represent..

[76]  Harald Kosch,et al.  The MPEG-7 Multimedia Database System (MPEG-7 MMDB) , 2008, J. Syst. Softw..

[77]  Özgür Ulusoy,et al.  Bilvideo-7: an MPEG-7- compatible video indexing and retrieval system , 2010, IEEE MultiMedia.

[78]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[79]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[80]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[81]  Özgür Ulusoy,et al.  BilVideo: A Video Database Management System , 2003, IEEE Multim..

[82]  Özgür Ulusoy,et al.  Automatic detection of salient objects and spatial relations in videos for a video database system , 2008, Image Vis. Comput..

[83]  Song Wang,et al.  Image-Segmentation Evaluation From the Perspective of Salient Object Extraction , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[84]  Nuno Vasconcelos,et al.  Saliency-based discriminant tracking , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Gang Hua,et al.  Context-Aware Visual Tracking , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[86]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[87]  Alex Rodriguez,et al.  VITALAS at TRECVID-2009 , 2009, TRECVID.

[88]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[89]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[90]  Pau-Choo Chung,et al.  Contrast context histogram - An efficient discriminating local descriptor for object recognition and image matching , 2008, Pattern Recognit..

[91]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[92]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[93]  Shih-Fu Chang,et al.  VideoQ: an automated content based video search system using visual cues , 1997, MULTIMEDIA '97.

[94]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[95]  Ferran Marqués,et al.  GAT: a Graphical Annotation Tool for semantic regions , 2009, Multimedia Tools and Applications.

[96]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[97]  Luís A. Alexandre,et al.  On combining classifiers using sum and product rules , 2001, Pattern Recognit. Lett..

[98]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[99]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[100]  Patrick Le Callet,et al.  A spatio-temporal model of the selective human visual attention , 2005, IEEE International Conference on Image Processing 2005.

[101]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[102]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[103]  Christopher Joseph Pal,et al.  YouTube Scale, Large Vocabulary Video Annotation , 2010, Video Search and Mining.

[104]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[105]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[106]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[107]  Arnold W. M. Smeulders,et al.  PicToSeek: combining color and shape invariant features for image retrieval , 2000, IEEE Trans. Image Process..

[108]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.