论文信息 - University of Applied Sciences Mittweida and Chemnitz University of Technology at TRECVID 2018

University of Applied Sciences Mittweida and Chemnitz University of Technology at TRECVID 2018

The analysis of video footage regarding the identification of persons at defined locations or the detection of complex activities is still a challenging process. Nowadays, various (semi-)automated systems can be used to overcome different parts of these challenges. Object detection and their classification reach ever higher detection rates when making use of latest cutting-edge convolutional neural network frameworks. In our contribution to the Instance Search task, we specifically discuss the design of a heterogeneous system which increases the identification performance for the detection and localization of persons at predefined places by heuristically combining multiple state-of-the-art object detection and places classification frameworks. In our initial appearance to the task of Activity of Extended Video (ActEV) which is engaged in the detection of more complex activities of persons or objects, we also incorporate state-of-the-art neural network object detection and classification frameworks in order to extract bounding boxes of salient regions or objects that can be used for further processing. However, a basic tracking of objects detected by bounding boxes needs a special algorithmic or feature-driven treatment in order to include statistical correlations between the individual frames. Our approach describes a simple but yet powerful method to track objects across video frames.

[1] Juan Carlos Niebles,et al. Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos , 2017, Image Vis. Comput..

[2] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] C. Chabris,et al. Gorillas in Our Midst: Sustained Inattentional Blindness for Dynamic Events , 1999, Perception.

[4] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Hussein Hussein,et al. Technische Universität Chemnitz at TRECVID Instance Search 2015 , 2014, TRECVID.

[6] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Yu Fan,et al. Restful API Architecture Based on Laravel Framework , 2017 .

[8] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9] Paul Over,et al. Instance search retrospective with focus on TRECVID , 2017, International Journal of Multimedia Information Retrieval.

[10] Jonathan G. Fiscus,et al. TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.

[11] Daijin Kim,et al. Robust human activity recognition from depth video using spatiotemporal multi-fused features , 2017, Pattern Recognit..

[12] James Hays,et al. SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.