论文引用

Deep Learning Based Imbalanced Data Classification and Information Retrieval for Multimedia Big Data

Yilin Yan,

2018

of a dissertation at the University of Miami. Dissertation supervised by Professor Mei-Ling Shyu. No. of pages in text. (153) The development in information science has enabled an explosive growth of ...

Concept Language Models and Event-based Concept Number Selection for Zero-example Event Detection

Ioannis Patras, Vasileios Mezaris, Fotini Markatopoulou et al.,

2017,

ICMR

Zero-example event detection is a problem where, given an event query as input but no example videos for training a detector, the system retrieves the most closely related videos. In this paper we pre...

A Crossmodal Approach to Multimodal Fusion in Video Hyperlinking

Guillaume Gravier, Christian Raymond, Vedran Vukotic,

2018,

IEEE MultiMedia

With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous represe...

Temporal localization of audio events for conflict monitoring in social media

Alexander G. Hauptmann, Junwei Liang, Lu Jiang,

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

With the explosion in the availability of user-generated videos documenting any conflicts and human rights abuses around the world, analysts and researchers increasingly find themselves overwhelmed wi...

Binary convolutional neural network features off-the-shelf for image to video linking in endoscopic multimedia databases

Klaus Schöffmann, Stefan Petscharnig,

2018,

Multimedia Tools and Applications

With a rigorous long-term archival of endoscopic surgeries, vast amounts of video and image data accumulate. Surgeons are not able to spend their valuable time to manually search within endoscopic mul...

UEC at TRECVID 2016 AVS task

Keiji Yanai, Natsagdorj Choijilsuren,

2016,

TRECVID

We started participating TRECVID in 2005, and we have been continuously submitting the results to TRECVID for ten years. For those years we usually participate in semantic indexing task (SIN) and MED ...

Informedia @ TRECVID 2017

Alexander G. Hauptmann, Shizhe Chen, Qin Jin et al.,

2017,

TRECVID

We report on our system used in the TRECVID 2017 Multimedia Event Detection (MED) and Ad-hoc Video Search (AVS) tasks. On the MED task, the CMU team submitted runs in 010Ex settings for the Pre-specif...

Informedia @ TRECVID 2016

Jiande Sun, Pingbo Pan, Yang Chen et al.,

2016,

TRECVID

We report on our system used in the TRECVID 2016 Multimedia Event Detection (MED) and Ad-hoc Video Search (AVS) tasks. On the MED task, the CMU team submitted runs in 000Ex, 010Ex and 100Ex settings f...

Evaluation of automatic video captioning using direct assessment

George Awad, Alan F. Smeaton, Yvette Graham et al.,

2017,

PloS one

We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for...

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

Xin Pan, Jonathon Shlens, Stefano Mazzocchi et al.,

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists of approximately 380,000 video se...

Shenzhen Institutes of Advanced Technology, CAS, China at TRECVID INS 2016

Linjie Xing, Diping Song, Jin Ye et al.,

2016,

TRECVID

We divide the task into person retrieval and location retrieval in TRECVID INS 2016, and then fuse the two results together with a simple . About person retrieval, we have two choices. One is based on...

Waseda at TRECVID 2016: Ad-hoc Video Search

Tetsunori Kobayashi, Kazuya Ueki, Kotaro Kikuchi et al.,

2016,

TRECVID

Waseda participated in the TRECVID 2016 Ad-hoc Video Search (AVS) task [1]. For the AVS task, we submitted four manually assisted runs. Our approach used the following processing steps: manually creat...

Multimodal Video-to-Video Linking: Turning to the Crowd for Insight and Evaluation

Martha Larson, Maria Eskevich, Gareth J. F. Jones et al.,

2017,

MMM

Video-to-video linking systems allow users to explore and exploit the content of a large-scale multimedia collection interactively and without the need to formulate specific queries. We present a shor...

A semantic-based video scene segmentation using a deep neural network

Heuiseok Lim, Kuekyeng Kim, Hyesung Ji et al.,

2019,

J. Inf. Sci.

Video scene segmentation is very important research in the field of computer vision, because it helps in efficient storage, indexing and retrieval of videos. Achieving this kind of scene segmentation ...

A Study on Multimodal Video Hyperlinking with Visual Aggregation

Guillaume Gravier, Mateusz Budnik, Mikail Demirdelen et al.,

2018 IEEE International Conference on Multimedia and Expo (ICME)

Video hyperlinking offers a way to explore a video collection, making use of links that connect segments having related content. Hyperlinking systems thus seek to automatically create links by connect...

Neighbourhood Structure Preserving Cross-Modal Embedding for Video Hyperlinking

Chong-Wah Ngo, Benoit Huet, Yanbin Hao et al.,

2020,

IEEE Transactions on Multimedia

Video hyperlinking is a task aiming to enhance the accessibility of large archives, by establishing links between fragments of videos. The links model the aboutness between fragments for efficient tra...

An improved hybridized deep structured model for accurate video event recognition

R. Kavitha, D. Chitra,

2020

Video event recognition plays an important role in the various research fields particularly in surveillance detection system. In the existing system it is done by deep hierarchical context model which...

Video Description

Wei Liu, Syed Zulqarnain Gilani, Ajmal Mian et al.,

2018,

ACM Comput. Surv.

Video description is the automatic generation of natural language sentences that describe the contents of a given video. It has applications in human-robot interaction, helping the visually impaired a...

VidCEP: Complex Event Processing Framework to Detect Spatiotemporal Patterns in Video Streams

Edward Curry, Piyush Yadav, E. Curry et al.,

2019 IEEE International Conference on Big Data (Big Data)

Video data is highly expressive and has traditionally been very difficult for a machine to interpret. Querying event patterns from video streams is challenging due to its unstructured representation. ...

Enabling GPU-Enhanced Computer Vision and Machine Learning Research Using Containers

Martial Michel, Nicholas Burnett,

2019,

ISC Workshops

Video analytics frameworks often rely on Neural Networks to perform their tasks. For example, a “You Only Look Once” object detection algorithm applies a single neural network to each image, divides t...