Incorporating Concept Ontology for Hierarchical Video Classification, Annotation, and Visualization

Most existing content-based video retrieval (CBVR) systems are now amenable to support automatic low-level feature extraction, but they still have limited effectiveness from a user's perspective because of the semantic gap. Automatic video concept detection via semantic classification is one promising solution to bridge the semantic gap. To speed up SVM video classifier training in high-dimensional heterogeneous feature space, a novel multimodal boosting algorithm is proposed by incorporating feature hierarchy and boosting to reduce both the training cost and the size of training samples significantly. To avoid the inter-level error transmission problem, a novel hierarchical boosting scheme is proposed by incorporating concept ontology and multitask learning to boost hierarchical video classifier training through exploiting the strong correlations between the video concepts. To bridge the semantic gap between the available video concepts and the users' real needs, a novel hyperbolic visualization framework is seamlessly incorporated to enable intuitive query specification and evaluation by acquainting the users with a good global view of large-scale video collections. Our experiments in one specific domain of surgery education videos have also provided very convincing results.

[1]  Ramana Rao,et al.  The Hyperbolic Browser: A Focus + Context Technique for Visualizing Large Hierarchies , 1996, J. Vis. Lang. Comput..

[2]  Takeo Kanade,et al.  Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Helge J. Ritter,et al.  On interactive visualization of high-dimensional data using the hyperbolic plane , 2002, KDD.

[4]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[5]  Philipp Cimiano,et al.  Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[6]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[7]  Alexander G. Hauptmann,et al.  The Use and Utility of High-Level Semantic Features in Video Retrieval , 2005, CIVR.

[8]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[9]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[11]  Ishwar K. Sethi,et al.  eID: a system for exploration of image databases , 2003, Inf. Process. Manag..

[12]  Joydeep Ghosh,et al.  Automatically learning document taxonomies for hierarchical classification , 2005, WWW '05.

[13]  John R. Smith,et al.  Modal Keywords, Ontologies, and Reasoning for Video Understanding , 2003, CIVR.

[14]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[15]  John R. Smith,et al.  Semi-automatic, data-driven construction of multimedia ontologies , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[16]  Jonathon S. Hare,et al.  Mind the gap: another look at the problem of the semantic gap in image retrieval , 2006, Electronic Imaging.

[17]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[18]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[20]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[21]  C. A. Lindley,et al.  A Multiple-Interpretation Framework for Modelling Video Semantics , 2007 .

[22]  Nuno Vasconcelos,et al.  Image indexing with mixture hierarchies , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Alexander G. Hauptmann,et al.  Towards a Large Scale Concept Ontology for Broadcast Video , 2004, CIVR.

[24]  Yi Wu,et al.  Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[25]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[26]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[27]  Jorma Laaksonen,et al.  Measuring Concept Similarities in Multimedia Ontologies: Analysis and Evaluations , 2007, IEEE Transactions on Multimedia.

[28]  David S. Doermann,et al.  Video retrieval of near-duplicates using κ-nearest neighbor retrieval of spatio-temporal descriptors , 2006, Multimedia Tools and Applications.

[29]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[30]  Jianping Fan,et al.  Multimodal Salient Objects: General Building Blocks of Semantic Video Concepts , 2004, CIVR.

[31]  Thomas Hofmann,et al.  Hierarchical Semantic Classification: Word Sense Disambiguation with World Knowledge , 2003, IJCAI.

[32]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[33]  Wei-Ying Ma,et al.  Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes , 2002, UAI.

[34]  Marcel Worring,et al.  Learning rich semantics from news video archives by style analysis , 2006, TOMCCAP.

[35]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[36]  Ramakant Nevatia,et al.  Hierarchical Language-based Representation of Events in Video Streams , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[37]  B. S. Manjunath,et al.  NeTra-V: toward an object-based video representation , 1997, Electronic Imaging.

[38]  John R. Kender,et al.  Lecture videos for e-learning: current research and challenges , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[39]  William I. Grosky,et al.  Narrowing the semantic gap - improved text-based web document retrieval using visual features , 2002, IEEE Trans. Multim..

[40]  Qi Tian,et al.  Visualization and User-Modeling for Browsing Personal Photo Libraries , 2004, International Journal of Computer Vision.

[41]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[42]  Prabhakar Raghavan,et al.  Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases , 1997, VLDB.

[43]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[44]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  G. P. Nguyen,et al.  Similarity Based Visualization of Image Collections , 2005 .

[46]  Richard Fikes,et al.  Tools for Assembling Modular Ontologies in Ontolingua , 1997, AAAI/IAAI.

[47]  Jianping Fan,et al.  Large-scale news video retrieval via visualization , 2006, MM '06.

[48]  Shih-Fu Chang,et al.  MediaNet: a multimedia information network for knowledge representation , 2000, SPIE Optics East.

[49]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[50]  John R. Smith,et al.  Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues , 2003, EURASIP J. Adv. Signal Process..

[51]  G. P. Nguyen,et al.  Similarity based vizualization of image collections , 2005 .

[52]  Jianping Fan,et al.  Exploring Large-Scale Video News via Interactive Visualization , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[53]  Marcel Worring,et al.  Building a visual ontology for video retrieval , 2005, MULTIMEDIA '05.

[54]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[55]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[56]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[57]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[58]  Shih-Fu Chang,et al.  IMKA: a multimedia organization system combining perceptual and semantic knowledge , 2001, MULTIMEDIA '01.

[59]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[60]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[61]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[62]  Kunio Fukunaga,et al.  Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions , 2002, International Journal of Computer Vision.

[63]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[64]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[65]  Wei-Ying Ma,et al.  Multimedia information retrieval: what is it, and why isn't anyone using it? , 2005, MIR '05.

[67]  Jane Hunter,et al.  Enhancing the semantic interoperability of multimedia through a core ontology , 2003, IEEE Trans. Circuits Syst. Video Technol..

[68]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[69]  Stephanie Knox,et al.  The Surgical Nosology In Primary-care Settings (SNIPS): a simple bridging classification for the interface between primary and specialist care , 2004, BMC health services research.

[70]  J. Langford,et al.  FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness , 2000, ICML.

[71]  Marcel Worring,et al.  Interactive access to large image collections using similarity-based visualization , 2008, J. Vis. Lang. Comput..

[72]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[73]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[74]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[75]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[76]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[77]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[78]  Jianping Fan,et al.  Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing , 2004, IEEE Transactions on Image Processing.

[79]  Shih-Fu Chang,et al.  A conceptual framework and empirical research for classifying visual descriptors , 2001 .