Automatic Annotation of Scientific Video Material based on Visual Concept Detection

Rapid growth of today's video archives along with sparsely available editorial metadata and too few capacities of libraries and archives for manual annotation demand for efficient approaches of automated metadata extraction. In addition, editorial and non-authoritative metadata is usually not fine-grained enough to describe video on a segment level, which is often required for efficient pinpoint search and retrieval. We consider the use case of the AV Portal provided by the German National Library of Science and Technology -- a web based video search engine that offers access to educational video content from various areas of engineering and natural sciences. User studies that have been conducted during the conceptional design stage of the AV Portal have indicated a strong interest of potential users to search for specific visual concepts, like e.g. "landscape", "drawing", "animation", within videos of a particular domain. We present an approach that supports automatic content-based classification of video segments that is tailored to the special requirements of the AV Portal regarding its technology oriented content and academic users. We furthermore show that semantic analysis of the generated metadata not only allows for better retrieval goal definition but also offers explorative search within the archive using visual concepts.

[1]  Harald Sack,et al.  Open Up Cultural Heritage in Video Archives with Mediaglobe , 2012, IICS.

[2]  Christian Bizer,et al.  Media Meets Semantic Web - How the BBC Uses DBpedia and Linked Data to Make Connections , 2009, ESWC.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[5]  Harald Sack,et al.  Integrating Social Tagging and Document Annotation for Content-Based Search in Multimedia Data , 2006, SAAW@ISWC.

[6]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[7]  Alan F. Smeaton,et al.  AXES at TRECVID 2011 , 2011, TRECVID.

[8]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[9]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[10]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[11]  Harald Sack,et al.  Towards exploratory video search using linked data , 2009, 2009 11th IEEE International Symposium on Multimedia.

[12]  Andreas Stafylopatis,et al.  Improving Semantic Search in Digital Libraries Using Multimedia Analysis , 2012, J. Multim..

[13]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[14]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[15]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[16]  Harald Sack,et al.  Semantic Multimedia Information Retrieval Based on Contextual Descriptions , 2013, ESWC.

[17]  Lora Aroyo,et al.  NoTube: making the Web part of personalised TV , 2010 .

[18]  Ina Blümel,et al.  Information supply beyond text: non‐textual information at the German National Library of Science and Technology (TIB) – challenges and planning , 2010 .

[19]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.