论文引用

Ensemble Learning with LDA Topic Models for Visual Concept Detection

Sheng Tang, Jintao Li, Gang Cao et al.,

2012

With the rapid growth of multimedia application technologies and network technologies, especially the proliferation of Web 2.0 and digital cameras, there has been an explosion of images and videos in ...

A Multimodal Scheme for Program Segmentation and Representation in Broadcast Video Streams

Ling-Yu Duan, Hanqing Lu, Qingshan Liu et al.,

2008,

IEEE Transactions on Multimedia

With the advance of digital video recording and playback systems, the request for efficiently managing recorded TV video programs is evident so that users can readily locate and browse their favorite ...

Smart Video Browsing with Augmented Navigation Bars

László Böszörményi, Bernd Münzer, Manfred del Fabro,

2013,

MMM

While accuracy and speed get a lot of attention in video retrieval research, the investigation of interactive retrieval tools gets less attention and is often regarded as trivial. We want to show that...

Semantic concept-based query expansion and re-ranking for multimedia retrieval

Rong Yan, Apostol Natsev, Lexing Xie et al.,

2007,

ACM Multimedia

We study the problem of semantic concept-based query expansion and re-ranking for multimedia retrieval. In particular, we explore the utility of a fixed lexicon of visual semantic concepts for automat...

Fisher Kernel Temporal Variation-based Relevance Feedback for video retrieval

Nicu Sebe, Bogdan Ionescu, Jasper R. R. Uijlings et al.,

2016,

Comput. Vis. Image Underst.

We proposed a novel framework for Relevance Feedback based on the Fisher Kernel.The Fisher Kernel representation makes possible to capture temporal variation by using frame-based features.We experimen...

Vocabulary Expansion Using Word Vectors for Video Semantic Indexing

Koichi Shinoda, Nakamasa Inoue,

2015,

ACM Multimedia

We propose vocabulary expansion for video semantic indexing. From many semantic concept detectors obtained by using training data, we make detectors for concepts not included in training data. First, ...

A reranking approach for context-based concept fusion in video indexing and retrieval

Shih-Fu Chang, Lyndon S. Kennedy, Shih-Fu Chang et al.,

2007,

CIVR '07

We propose to incorporate hundreds of pre-trained concept detectors to provide contextual information for improving the performance of multimodal video search. The approach takes initial search result...

n-gram Models for Video Semantic Indexing

Koichi Shinoda, Nakamasa Inoue,

2014,

ACM Multimedia

We propose n-gram modeling of shot sequences for video semantic indexing, in which semantic concepts are extracted from a video shot. Most previous studies for this task have assumed that video shots ...

A Selective Weighted Late Fusion for Visual Concept Recognition

Emmanuel Dellandréa, Liming Chen, Charles-Edmond Bichot et al.,

2012,

ECCV Workshops

We propose in this paper a novel multimodal approach to automatically predict the visual concepts of images through an effective fusion of visual and textual features. It relies on a Selective Weighte...

Query by Example by Extracting Inductive Query Definitions Using Rough Set Theory

Kimiaki Shirahama, Kuniaki Uehara, Yuta Matsuoka,

2012

We propose a query-by-example method that can retrieve a variety of shots relevant to a query, but these shots contain significantly different features due to varied shooting techniques and settings. ...

Fusion in Computer Vision

Jenny Benois-Pineau, Georges Quénot, Tomas Piatrik et al.,

2014,

Advances in Computer Vision and Pattern Recognition

We propose a novel multimodal approach to automatically predict the visual concepts of images through an effective fusion of visual and textual features. It relies on a Selective Weighted Late Fusion ...

A Novel Method for Spoken Text Feature Extraction in Semantic Video Retrieval

Sheng Tang, Yongdong Zhang, Jintao Li et al.,

2006,

PCM

We propose a novel method for extracting text feature from the automatic speech recognition (ASR) results in semantic video retrieval. We combine HowNet-rule-based knowledge with statistic information...

Covering the Space of Tilts. Application to Affine Invariant Image Comparison

Julie Delon, Jean-Michel Morel, Mariano Rodríguez et al.,

2018,

SIAM J. Imaging Sci.

We propose a mathematical method to analyze the numerous algorithms performing Image Matching by Affine Simulation (IMAS). To become affine invariant they apply a discrete set of affine transforms to ...

A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors

Koichi Shinoda, Nakamasa Inoue, Nakamasa Inoue et al.,

2012,

IEEE Transactions on Multimedia

We propose a fast maximum a posteriori (MAP) adaptation method for video semantic indexing that uses Gaussian mixture model (GMM) supervectors. In this method, a tree-structured GMM is utilzed to decr...

Extreme video retrieval: joint maximization of human and computer performance

Rong Yan, Wei-Hao Lin, Jun Yang et al.,

2006,

MM '06

We present an efficient system for video search that maximizes the use of human bandwidth, while at the same time exploiting the machine's ability to learn in real-time from user selected relevant vid...

A System That Learns to Tag Videos by Watching Youtube

Adrian Ulges, Thomas M. Breuel, Christian Schulze et al.,

2008,

ICVS

We present a system that automatically tags videos, i.e. detects high-level semantic concepts like objects or actions in them. To do so, our system does not rely on datasets manually annotated for res...

Video browsing interfaces and applications: a review

Frank Hopfgartner, Joemon M. Jose, Oge Marques et al.,

2010

We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activi...

Zhejiang University at TRECVID 2006

Yueting Zhuang, Yanan Liu, Fei Wu et al.,

2006,

TRECVID

We participated in the high-level feature extraction and interactive-search task for TRECVID 2006. Interaction and integration of multi-modality media types such as visual, audio and textual data in v...

PKU_ICST at TRECVID 2018: Instance Search Task

Zhang Wen, Yuxin Peng, Hongbo Sun et al.,

2013,

TRECVID

We participated in all two types of instance search (INS) task in TRECVID 2015: automatic search and interactive search. This paper presents our approaches and results. In this task, we mainly focused...

Memory recall based video search: Finding videos you have seen before based on your memory

Meng Wang, Yi-Liang Zhao, Tat-Seng Chua et al.,

2014,

TOMCCAP

We often remember images and videos that we have seen or recorded before but cannot quite recall the exact venues or details of the contents. We typically have vague memories of the contents, which ca...