论文信息 - IBM Research TRECVID-2009 Video Retrieval System

IBM Research TRECVID-2009 Video Retrieval System

In this paper, we describe the IBM Research system for indexing, analysis, and retrieval of video as applied to the TREC-2007 video retrieval benchmark. This year, focus of the system improvement was on cross-domain learning, automation, scalability, and interactive search.

[1] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .

[2] J. J. Rocchio,et al. Relevance feedback in information retrieval , 1971 .

[3] Elizabeth D. Barraclough. ON‐LINE SEARCHING IN INFORMATION RETRIEVAL , 1977 .

[4] Susan T. Dumais,et al. The vocabulary problem in human-system communication , 1987, CACM.

[5] D. W. Scott,et al. Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[6] Vladimir Vapnik,et al. The Nature of Statistical Learning , 1995 .

[7] W. Bruce Croft,et al. Query expansion using local and global document analysis , 1996, SIGIR '96.

[8] Tomás Lozano-Pérez,et al. A Framework for Multiple-Instance Learning , 1997, NIPS.

[9] Sunil Arya,et al. ANN: library for approximate nearest neighbor searching , 1998 .

[10] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[11] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[12] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[13] Naftali Tishby,et al. Agglomerative Information Bottleneck , 1999, NIPS.

[14] Jos F. Sturm,et al. A Matlab toolbox for optimization over symmetric cones , 1999 .

[15] Gökhan Tür,et al. Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[16] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17] Nathalie Japkowicz,et al. The Class Imbalance Problem: Significance and Strategies , 2000 .

[18] Tomas E. Ward,et al. Segmentation and detection at IBM: Hybrid statistical models and two-tiered clustering broadcast new , 2000 .

[19] Dragomir R. Radev,et al. Question-answering by predictive annotation , 2000, SIGIR '00.

[20] Brendan J. Frey,et al. Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[21] Arnold W. M. Smeulders,et al. Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[22] B. S. Manjunath,et al. Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[23] Alan F. Smeaton,et al. Designing the User Interface for the Físchlár Digital Video Library , 2006, J. Digit. Inf..

[24] Eitan Farchi,et al. Automatic query wefinement using lexical affinities with maximal information gain , 2002, SIGIR '02.

[25] John R. Smith,et al. Modeling semantic concepts to support query by keywords in video , 2002, Proceedings. International Conference on Image Processing.

[26] Haim H. Permuter,et al. IBM Research TREC 2002 Video Retrieval System , 2002, TREC.

[27] David Carmel,et al. JuruXML - an XML Retrieval System at INEX'02 , 2002, INEX Workshop.

[28] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[29] Thomas S. Huang,et al. Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[30] John R. Smith,et al. Interactive search fusion methods for video database retrieval , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[31] John R. Smith,et al. Exploring semantic dependencies for scalable concept detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[32] Qi Tian,et al. A Two-Level Multi-Modal Approach for Story Segmentation of Large News Video Corpus , 2003, TRECVID.

[33] John R. Smith,et al. VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning , 2003 .

[34] John R. Smith,et al. A framework for moderate vocabulary semantic visual concept detection , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[35] John R. Smith,et al. Normalized classifier fusion for semantic visual concept detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[36] John R. Smith,et al. Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[37] John R. Smith,et al. MPEG-7 video automatic labeling system , 2003, MULTIMEDIA '03.

[38] J.R. Smith,et al. Learning visual models of semantic concepts , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[39] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[40] John R. Smith,et al. New anchor selection methods for image retrieval , 2003, IS&T/SPIE Electronic Imaging.

[41] John R. Smith,et al. User-trainable video annotation using multimodal cues , 2003, SIGIR '03.

[42] John R. Smith,et al. Role of classifiers in multimedia content management , 2003, IS&T/SPIE Electronic Imaging.

[43] John R. Smith,et al. VideoAL: a novel end-to-end MPEG-7 video automatic labeling system , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[44] Ted Pedersen,et al. Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[45] Ching-Yung Lin,et al. Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[46] John R. Smith,et al. A Hybrid Framework for Detecting the Semantics of Concepts and Context , 2003, CIVR.

[47] Shih-Fu Chang,et al. Discovery and fusion of salient multimodal features toward news story segmentation , 2003, IS&T/SPIE Electronic Imaging.

[48] John R. Smith,et al. Learning regional semantic concepts from incomplete annotation , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[49] Ted Pedersen,et al. Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[50] John R. Smith,et al. Active selection for multi-example querying by content , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[51] John R. Smith,et al. Semantic representation: search and mining of multimedia content , 2004, KDD '04.

[52] R. Manmatha,et al. Using Maximum Entropy for Automatic Image Annotation , 2004, CIVR.

[53] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[54] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[55] John R. Smith,et al. On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[56] IBM's PIQUANT II in TREC 2004 , 2004, TREC.

[57] Stephen Kwek,et al. Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[58] Shih-Fu Chang,et al. Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[59] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[60] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[61] John D. Lafferty,et al. Statistical Models for Text Segmentation , 1999, Machine Learning.

[62] Jing Huang,et al. Spatial Color Indexing and Applications , 2004, International Journal of Computer Vision.

[63] David A. Ferrucci,et al. UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[64] Christian Petersohn. Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection System , 2004, TRECVID.

[65] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[66] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[67] Milind R. Naphade,et al. Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[68] John R. Smith,et al. A generalized multiple instance learning algorithm for large scale modeling of multimedia semantics , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[69] John R. Smith,et al. A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[70] Harriet J. Nock,et al. Semantic annotation of multimedia using maximum entropy models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[71] Cordelia Schmid,et al. A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72] Alexei A. Efros,et al. Discovering object categories in image collections , 2005 .

[73] Xiaojin Zhu,et al. --1 CONTENTS , 2006 .

[74] Paul Over,et al. TRECVID 2005 - An Overview , 2005, TRECVID.

[75] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[76] Shih-Fu Chang,et al. Automatic discovery of query-class-dependent models for multimodal search , 2005, MULTIMEDIA '05.

[77] Jelena Tesic. Metadata Practices for Consumer Photos , 2005, IEEE Multim..

[78] Shih-Fu Chang,et al. Visual Cue Cluster Construction via Information Bottleneck Principle and Kernel Density Estimation , 2005, CIVR.

[79] Rong Yan,et al. Probabilistic models for combining diverse knowledge sources in multimedia retrieval , 2006 .

[80] Xuelong Li,et al. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81] John R. Smith,et al. Semantic Labeling of Multimedia Content Clusters , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[82] Milind R. Naphade,et al. Semantic Multimedia Retrieval using Lexical Query Expansion and Model-Based Reranking , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[83] John R. Smith,et al. Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[84] Rong Yan,et al. Extreme video retrieval: joint maximization of human and computer performance , 2006, MM '06.

[85] Shahram Ebadollahi,et al. Visual Event Detection using Multi-Dimensional Concept Dynamics , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[86] Chong-Wah Ngo,et al. Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[87] Rong Yan,et al. An efficient manual image annotation approach based on tagging and browsing , 2007, MS '07.

[88] Apostol Natsev,et al. Dynamic Multimodal Fusion in Video Search , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[89] Sheng Tang,et al. TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[90] Stéphane Ayache,et al. Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[91] Rong Yan,et al. Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[92] John R. Smith,et al. Data Modeling Strategies for Imbalanced Learning in Visual Search , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[93] John R. Smith,et al. Cluster-based data modeling for semantic video search , 2007, CIVR '07.

[94] David C. Gibbon,et al. A Fast, Comprehensive Shot Boundary Determination System , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[95] Ng-Loy Wee Loon. Singapore , 2008, Intellectual Property in Asia.

[96] Koen E. A. van de Sande,et al. Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[97] David G. Lowe,et al. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[98] Rong Yan,et al. Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce , 2009, LS-MMRM '09.

[99] Carl Eklund,et al. National Institute for Standards and Technology , 2009, Encyclopedia of Biometrics.

[100] Koen E. A. van de Sande,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101] Cor J. Veenman,et al. Comparing compact codebooks for visual categorization , 2010, Comput. Vis. Image Underst..

[102] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.