IBM Research TRECVID-2009 Video Retrieval System

In this paper, we describe the IBM Research system for indexing, analysis, and retrieval of video as applied to the TREC-2007 video retrieval benchmark. This year, focus of the system improvement was on cross-domain learning, automation, scalability, and interactive search.

[1]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[2]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[3]  Elizabeth D. Barraclough ON‐LINE SEARCHING IN INFORMATION RETRIEVAL , 1977 .

[4]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[5]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[6]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[7]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[8]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[9]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[12]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[13]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[14]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[15]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[18]  Tomas E. Ward,et al.  Segmentation and detection at IBM: Hybrid statistical models and two-tiered clustering broadcast new , 2000 .

[19]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.

[20]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[21]  Arnold W. M. Smeulders,et al.  Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[23]  Alan F. Smeaton,et al.  Designing the User Interface for the Físchlár Digital Video Library , 2006, J. Digit. Inf..

[24]  Eitan Farchi,et al.  Automatic query wefinement using lexical affinities with maximal information gain , 2002, SIGIR '02.

[25]  John R. Smith,et al.  Modeling semantic concepts to support query by keywords in video , 2002, Proceedings. International Conference on Image Processing.

[26]  Haim H. Permuter,et al.  IBM Research TREC 2002 Video Retrieval System , 2002, TREC.

[27]  David Carmel,et al.  JuruXML - an XML Retrieval System at INEX'02 , 2002, INEX Workshop.

[28]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[29]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[30]  John R. Smith,et al.  Interactive search fusion methods for video database retrieval , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[31]  John R. Smith,et al.  Exploring semantic dependencies for scalable concept detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[32]  Qi Tian,et al.  A Two-Level Multi-Modal Approach for Story Segmentation of Large News Video Corpus , 2003, TRECVID.

[33]  John R. Smith,et al.  VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning , 2003 .

[34]  John R. Smith,et al.  A framework for moderate vocabulary semantic visual concept detection , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[35]  John R. Smith,et al.  Normalized classifier fusion for semantic visual concept detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[36]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[37]  John R. Smith,et al.  MPEG-7 video automatic labeling system , 2003, MULTIMEDIA '03.

[38]  J.R. Smith,et al.  Learning visual models of semantic concepts , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[39]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[40]  John R. Smith,et al.  New anchor selection methods for image retrieval , 2003, IS&T/SPIE Electronic Imaging.

[41]  John R. Smith,et al.  User-trainable video annotation using multimodal cues , 2003, SIGIR '03.

[42]  John R. Smith,et al.  Role of classifiers in multimedia content management , 2003, IS&T/SPIE Electronic Imaging.

[43]  John R. Smith,et al.  VideoAL: a novel end-to-end MPEG-7 video automatic labeling system , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[44]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[45]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[46]  John R. Smith,et al.  A Hybrid Framework for Detecting the Semantics of Concepts and Context , 2003, CIVR.

[47]  Shih-Fu Chang,et al.  Discovery and fusion of salient multimodal features toward news story segmentation , 2003, IS&T/SPIE Electronic Imaging.

[48]  John R. Smith,et al.  Learning regional semantic concepts from incomplete annotation , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[49]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[50]  John R. Smith,et al.  Active selection for multi-example querying by content , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[51]  John R. Smith,et al.  Semantic representation: search and mining of multimedia content , 2004, KDD '04.

[52]  R. Manmatha,et al.  Using Maximum Entropy for Automatic Image Annotation , 2004, CIVR.

[53]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[54]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[55]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[56]  IBM's PIQUANT II in TREC 2004 , 2004, TREC.

[57]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[58]  Shih-Fu Chang,et al.  Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[59]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[60]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[61]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[62]  Jing Huang,et al.  Spatial Color Indexing and Applications , 2004, International Journal of Computer Vision.

[63]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[64]  Christian Petersohn Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[65]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[66]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[67]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[68]  John R. Smith,et al.  A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[69]  John R. Smith,et al.  A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[70]  Harriet J. Nock,et al.  Semantic annotation of multimedia using maximum entropy models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[71]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[73]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[74]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[75]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[76]  Shih-Fu Chang,et al.  Automatic discovery of query-class-dependent models for multimodal search , 2005, MULTIMEDIA '05.

[77]  Jelena Tesic Metadata Practices for Consumer Photos , 2005, IEEE Multim..

[78]  Shih-Fu Chang,et al.  Visual Cue Cluster Construction via Information Bottleneck Principle and Kernel Density Estimation , 2005, CIVR.

[79]  Rong Yan,et al.  Probabilistic models for combining diverse knowledge sources in multimedia retrieval , 2006 .

[80]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  John R. Smith,et al.  Semantic Labeling of Multimedia Content Clusters , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[82]  Milind R. Naphade,et al.  Semantic Multimedia Retrieval using Lexical Query Expansion and Model-Based Reranking , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[83]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[84]  Rong Yan,et al.  Extreme video retrieval: joint maximization of human and computer performance , 2006, MM '06.

[85]  Shahram Ebadollahi,et al.  Visual Event Detection using Multi-Dimensional Concept Dynamics , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[86]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[87]  Rong Yan,et al.  An efficient manual image annotation approach based on tagging and browsing , 2007, MS '07.

[88]  Apostol Natsev,et al.  Dynamic Multimodal Fusion in Video Search , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[89]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[90]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[91]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[92]  John R. Smith,et al.  Data Modeling Strategies for Imbalanced Learning in Visual Search , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[93]  John R. Smith,et al.  Cluster-based data modeling for semantic video search , 2007, CIVR '07.

[94]  David C. Gibbon,et al.  A Fast, Comprehensive Shot Boundary Determination System , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[95]  Ng-Loy Wee Loon Singapore , 2008, Intellectual Property in Asia.

[96]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[97]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[98]  Rong Yan,et al.  Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce , 2009, LS-MMRM '09.

[99]  Carl Eklund,et al.  National Institute for Standards and Technology , 2009, Encyclopedia of Biometrics.

[100]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Cor J. Veenman,et al.  Comparing compact codebooks for visual categorization , 2010, Comput. Vis. Image Underst..

[102]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.