Integrating multi-modal content analysis and hyperbolic visualization for large-scale news video retrieval and exploration

In this paper, we have developed a novel scheme to achieve more effective analysis, retrieval and exploration of large-scale news video collections by performing multi-modal video content analysis and synchronization. First, automatic keyword extraction is performed on news closed captions and audio channels to detect the most interesting news topics (i.e., keywords for news topic interpretation), and the associations among these news topics (i.e., contextual relationships among the news topics) are further determined according to their co-occurrence probabilities. Second, visual semantic items, such as human faces, text captions, video concepts, are extracted automatically by using our semantic video analysis techniques. The news topics are automatically synchronized with the most relevant visual semantic items. In addition, an interestingness weight is assigned for each news topic to characterize its importance. Finally, a novel hyperbolic visualization scheme is incorporated to visualize large-scale news topics according to their associations and interestingness. With a better global overview of large-scale news video collections, users can specify their queries more precisely and explore large-scale news video collections interactively. Our experiments on large-scale news video collections have provided very positive results.

[1]  Jianping Fan,et al.  Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing , 2004, IEEE Transactions on Image Processing.

[2]  Rita Cucchiara,et al.  A Semi-Automatic Video Annotation tool with MPEG-7 Content Collections , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[3]  Michael G. Strintzis,et al.  An Ontology-Based Frameowrk for Semantic Image Analysis and Retrieval , 2007 .

[4]  Alexander G. Hauptmann Lessons for the Future from a Decade of Informedia Video Analysis Research , 2005, CIVR.

[5]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[6]  Tatiana Louchnikova,et al.  Flexible image decomposition for multimedia indexing and retrieval , 2001, IS&T/SPIE Electronic Imaging.

[7]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..

[8]  N. O'Connor,et al.  SEMI-AUTOMATIC VIDEO OBJECT SEGMENTATION USING RECURSIVE SHORTEST SPANNING TREE AND BINARY PARTITION TREE , 2001 .

[9]  Michael G. Strintzis,et al.  Knowledge-assisted semantic video object detection , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[11]  A. Bernardino,et al.  Binocular Visual Tracking : Integration of Perception and Control , 1999 .

[12]  李幼升,et al.  Ph , 1989 .

[13]  Jianping Fan,et al.  Exploring Large-Scale Video News via Interactive Visualization , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[14]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[15]  Jarke J. van Wijk,et al.  Bridging the Gaps , 2006, IEEE Computer Graphics and Applications.

[16]  Raya Fidel,et al.  A multidimensional approach to the study of human-information interaction: A case study of collaborative information retrieval , 2004, J. Assoc. Inf. Sci. Technol..

[17]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[18]  Derek Hoiem,et al.  Object-based image retrieval using the statistical structure of images , 2004, CVPR 2004.

[19]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jianping Fan,et al.  Multi-level annotation of natural scenes using dominant image components and semantic concepts , 2004, MULTIMEDIA '04.

[21]  Nozha Boujemaa,et al.  Region-based image retrieval: fast coarse segmentation and fine color description , 2004, J. Vis. Lang. Comput..

[22]  Svetha Venkatesh,et al.  Toward automatic extraction of expressive elements from motion pictures: tempo , 2002, IEEE Trans. Multim..

[23]  Anil K. Jain,et al.  Automatic classification of tennis video for high-level content-based retrieval , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[24]  Shin'ichi Satoh,et al.  An efficient implementation and evaluation of robust face sequence matching , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[25]  G. P. Nguyen,et al.  Similarity Based Visualization of Image Collections , 2005 .

[26]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[27]  JainRamesh,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000 .

[28]  Steven Skiena,et al.  Spatial Analysis of News Sources , 2006, IEEE Transactions on Visualization and Computer Graphics.

[29]  Chrisa Tsinaraki,et al.  A multimedia user preference model that supports semantics and its application to MPEG 7/21 , 2006, 2006 12th International Multi-Media Modelling Conference.

[30]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[31]  Lucy T. Nowell,et al.  ThemeRiver: visualizing theme changes over time , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[32]  Noel E. O'Connor,et al.  Learning Midlevel Image Features for Natural Scene and Texture Classification , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Yannis Avrithis,et al.  Semantic Image Segmentation and Object Labeling , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  G. P. Nguyen,et al.  Similarity based vizualization of image collections , 2005 .

[35]  Paul Whitney,et al.  Multi-faceted insight through interoperable visual information analysis paradigms , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[36]  Steffen Staab,et al.  Introducing Context and Reasoning in Visual Content Analysis: An Ontology-Based Framework , 2008 .

[37]  Ramana Rao,et al.  The Hyperbolic Browser: A Focus + Context Technique for Visualizing Large Hierarchies , 1996, J. Vis. Lang. Comput..

[38]  Helge J. Ritter,et al.  On interactive visualization of high-dimensional data using the hyperbolic plane , 2002, KDD.

[39]  David Jensen,et al.  TimeMines: Constructing Timelines with Statistical Models of Word Usage , 2000, KDD 2000.

[40]  Ishwar K. Sethi,et al.  eID: a system for exploration of image databases , 2003, Inf. Process. Manag..

[41]  Carlo Tomasi,et al.  Texture-based image retrieval without segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[42]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[43]  Svetha Venkatesh,et al.  Towards automatic extraction of expressive elements from motion pictures: tempo , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[44]  Yannis Avrithis,et al.  Personalized Content Retrieval in Context Using Ontological Knowledge , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  Qi Tian,et al.  Visualization and User-Modeling for Browsing Personal Photo Libraries , 2004, International Journal of Computer Vision.

[46]  Alexander G. Hauptmann,et al.  Text, Speech, and Vision for Video Segmentation: The InformediaTM Project , 1995 .

[47]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[48]  C.-C. Jay Kuo,et al.  Rule-based video classification system for basketball video indexing , 2000, MULTIMEDIA '00.

[49]  Edward Y. Chang,et al.  Semantics and feature discovery via confidence-based ensemble , 2005, TOMCCAP.

[50]  Alberto Del Bimbo,et al.  3D Face Identification Based on Arrangement of Salient Wrinkles , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[51]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Marcel Worring,et al.  Learning rich semantics from news video archives by style analysis , 2006, TOMCCAP.

[53]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[54]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.