University of Applied Sciences Mittweida and Chemnitz University of Technology at TRECVID ActEv 2019

The analysis of video footage involving tasks such as identifying certain individuals at defined locations in a complex indoor or outdoor scenes or classifying person’s activities still poses a challenge to any video retrieval system. Nowadays a variety of (semi-)automated analysis systems can be applied in order to solve some of its specific subproblems and the accuracy of object detection and simultaneously the detection and classification accuracy benefits strongly when latest cutting-edge machine learning methods such as deeplearning networks are involved. In this paper we propose our design of a heterogeneous video analysis system and report about our experiences with its application to the Activity of Extended Video (ActEV) analysis task within the TREC Video Retrieval Evaluation (TRECVID) contest. The proposed system improves the performance of person detection, identification and localization at predefined places in video scenes by heuristically combining a variety of state-of-the-art deep-learning frameworks for object detection and places classification into one heterogeneous system. The incorporated frameworks adress a wide range of subproblems including stable object boundary extraction of salient regions or identifiable objects as well as person identification and object / place classification. Our approach integrates these processing artifacts using a feature-oriented approach in order to assess statistical correlations together with LSTM based activity classifiers across video frames.