Deep Learning for Intelligent Video Analysis

Analyzing videos is one of the fundamental problems of computer vision and multimedia content analysis for decades. The task is very challenging as video is an information-intensive media with large variations and complexities. Thanks to the recent development of deep learning techniques, researchers in both computer vision and multimedia communities are now able to boost the performance of video analysis significantly and initiate new research directions to analyze video content. This tutorial will present recent advances under the umbrella of video understanding, which start from a unified deep learning toolkit--Microsoft Cognitive Toolkit (CNTK) that supports popular model types such as convolutional nets and recurrent networks, to fundamental challenges of video representation learning and video classification, recognition, and finally to an emerging area of video and language.

[1]  Tao Mei,et al.  Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure , 2016, IJCAI.

[2]  Tao Mei,et al.  Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding , 2016, ACM Multimedia.

[3]  Tao Mei,et al.  Learning hierarchical video representation for action recognition , 2017, International Journal of Multimedia Information Retrieval.

[4]  Tao Mei,et al.  Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Tao Mei,et al.  Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Tao Mei,et al.  Video Captioning with Transferred Semantic Attributes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Tao Mei,et al.  MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Tao Mei,et al.  Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Tao Mei,et al.  Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation , 2016, ICMR.

[10]  Tao Mei,et al.  Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Tao Mei,et al.  Deep Quantization: Encoding Convolutional Activations with Deep Generative Model , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).