Deep Video Understanding: Representation Learning, Action Recognition, and Language Generation

Analyzing videos is one of the fundamental problems of computer vision and multimedia analysis for decades. The task is very challenging as video is an information-intensive media with large variations and complexities. Thanks to the recent development of deep learning techniques, researchers in both computer vision and multimedia communities are now able to boost the performance of video analysis significantly and initiate new research directions to analyze video content. This talk will cover recent advances under the umbrella of video understanding, which start from basic networks that are widely adopted in state-of-the-art deep learning pipelines, to fundamental challenges of video representation learning and video classification/recognition, finally to an emerging area of video and language.