LTC-GIF: Attracting More Clicks on Feature-length Sports Videos

This paper proposes a lightweight method to attract users and increase views of the video by presenting personalized artistic media – i.e, static thumbnails and animated GIFs. This method analyzes lightweight thumbnail containers (LTC) using computational resources of the client device to recognize personalized events from full-length sports videos. In addition, instead of processing the entire video, small video segments are processed to generate artistic media. This makes the proposed approach more computationally efficient compared to the baseline approaches that create artistic media using the entire video. The proposed method retrieves and uses thumbnail containers and video segments, which reduces the required transmission bandwidth as well as the amount of locally stored data used during artistic media generation. When extensive experiments were conducted on the Nvidia Jetson TX2, the computational complexity of the proposed method was 3.57 times lower than that of the SoA method. In the qualitative assessment, GIFs generated using the proposed method received 1.02 higher overall ratings compared to the SoA method. To the best of our knowledge, this is the first technique that uses LTC to generate artistic media while providing lightweight and high-performance services even on resource constrained devices.

[1]  Jiebo Luo,et al.  Sentiment Recognition for Short Annotated GIFs Using Visual-Textual Fusion , 2020, IEEE Transactions on Multimedia.

[2]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[5]  Rosalind W. Picard,et al.  GIFGIF+: Collecting emotional animated GIFs with clustered multi-task learning , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[6]  Wenwu Zhu,et al.  Sentence Specified Dynamic Video Thumbnail Generation , 2019, ACM Multimedia.

[7]  Eun-Seok Ryu,et al.  Toward 3DoF+ 360 Video Streaming System for Immersive Media , 2019, IEEE Access.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Bernard Ghanem,et al.  SCC: Semantic Context Cascade for Efficient Action Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Nikil Jayant,et al.  Home gateway for three-screen TV using H.264 SVC and raptor FEC , 2011, IEEE Transactions on Consumer Electronics.

[11]  Jianxin Wu,et al.  Vortex Pooling: Improving Context Representation in Semantic Segmentation , 2018, ArXiv.

[12]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[13]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Qiuyu Chen,et al.  GIF Thumbnails: Attract More Clicks to Your Videos , 2021, AAAI.

[16]  Eun-Seok Ryu,et al.  360-degree Video Offloading Using Millimeter-wave Communication for Cyberphysical System , 2019, Trans. Emerg. Telecommun. Technol..

[17]  Yong Dou,et al.  Exploring Frame Segmentation Networks for Temporal Action Localization , 2019, J. Vis. Commun. Image Represent..

[18]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Chuck Yoo,et al.  Towards building large scale live media streaming framework for a U-city , 2007, Multimedia Tools and Applications.

[20]  Shih-Fu Chang,et al.  Predicting Viewer Perceived Emotions in Animated GIFs , 2014, ACM Multimedia.

[21]  Yale Song,et al.  Fast, Cheap, and Good: Why Animated GIFs Engage Us , 2016, CHI.

[22]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yang Li,et al.  Energy-Efficient Resource Allocation for Application Including Dependent Tasks in Mobile Edge Computing , 2020, KSII Trans. Internet Inf. Syst..

[24]  Yuxin Peng,et al.  Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Can Zhang,et al.  Social-Aware Collaborative Caching Based on User Preferences for D2D Content Sharing , 2020 .

[27]  Yale Song,et al.  To Click or Not To Click: Automatic Selection of Beautiful Thumbnails from Videos , 2016, CIKM.

[28]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Louis-Philippe Morency,et al.  Temporal Attention-Gated Model for Robust Sequence Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Eun-Seok Ryu,et al.  Client-Driven Personalized Trailer Framework Using Thumbnail Containers , 2020, IEEE Access.

[31]  Xiaoqin Zeng,et al.  Auxiliary Stacked Denoising Autoencoder based Collaborative Filtering Recommendation , 2020, KSII Transactions on Internet and Information Systems.

[32]  Ghulam Mujtaba,et al.  Energy Efficient Data Encryption Techniques in Smartphones , 2019, Wirel. Pers. Commun..

[33]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[34]  Winston H. Hsu,et al.  Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos , 2019, IEEE Transactions on Multimedia.

[35]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Yonghong Tian,et al.  ODN: Opening the Deep Network for Open-Set Action Recognition , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[37]  Yazan Abu Farha,et al.  MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Eun-Seok Ryu,et al.  Client-driven animated GIF generation framework using an acoustic feature , 2021, Multim. Tools Appl..