of a dissertation at the University of Miami. Dissertation supervised by Professor Mei-Ling Shyu. No. of pages in text. (153) The development in information science has enabled an explosive growth of ...
Zero-example event detection is a problem where, given an event query as input but no example videos for training a detector, the system retrieves the most closely related videos. In this paper we pre...
With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous represe...
With the explosion in the availability of user-generated videos documenting any conflicts and human rights abuses around the world, analysts and researchers increasingly find themselves overwhelmed wi...
With a rigorous long-term archival of endoscopic surgeries, vast amounts of video and image data accumulate. Surgeons are not able to spend their valuable time to manually search within endoscopic mul...
We started participating TRECVID in 2005, and we have been continuously submitting the results to TRECVID for ten years. For those years we usually participate in semantic indexing task (SIN) and MED ...
We report on our system used in the TRECVID 2017 Multimedia Event Detection (MED) and Ad-hoc Video Search (AVS) tasks. On the MED task, the CMU team submitted runs in 010Ex settings for the Pre-specif...
We report on our system used in the TRECVID 2016 Multimedia Event Detection (MED) and Ad-hoc Video Search (AVS) tasks. On the MED task, the CMU team submitted runs in 000Ex, 010Ex and 100Ex settings f...
We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for...
We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists of approximately 380,000 video se...
We divide the task into person retrieval and location retrieval in TRECVID INS 2016, and then fuse the two results together with a simple . About person retrieval, we have two choices. One is based on...
Waseda participated in the TRECVID 2016 Ad-hoc Video Search (AVS) task [1]. For the AVS task, we submitted four manually assisted runs. Our approach used the following processing steps: manually creat...
Video-to-video linking systems allow users to explore and exploit the content of a large-scale multimedia collection interactively and without the need to formulate specific queries. We present a shor...
Video scene segmentation is very important research in the field of computer vision, because it helps in efficient storage, indexing and retrieval of videos. Achieving this kind of scene segmentation ...
Video hyperlinking offers a way to explore a video collection, making use of links that connect segments having related content. Hyperlinking systems thus seek to automatically create links by connect...
Video hyperlinking is a task aiming to enhance the accessibility of large archives, by establishing links between fragments of videos. The links model the aboutness between fragments for efficient tra...
Video event recognition plays an important role in the various research fields particularly in surveillance detection system. In the existing system it is done by deep hierarchical context model which...
Video description is the automatic generation of natural language sentences that describe the contents of a given video. It has applications in human-robot interaction, helping the visually impaired a...
Video data is highly expressive and has traditionally been very difficult for a machine to interpret. Querying event patterns from video streams is challenging due to its unstructured representation. ...
Video analytics frameworks often rely on Neural Networks to perform their tasks. For example, a “You Only Look Once” object detection algorithm applies a single neural network to each image, divides t...