Intelligent design of multimedia content in Alibaba

Multimedia content is an integral part of Alibaba’s business ecosystem and is in great demand. The production of multimedia content usually requires high technology and much money. With the rapid development of artificial intelligence (AI) technology in recent years, to meet the design requirements of multimedia content, many AI auxiliary tools for the production of multimedia content have emerged and become more and more widely used in Alibaba’s business ecology. Related applications include mainly auxiliary design, graphic design, video generation, and page production. In this report, a general pipeline of the AI auxiliary tools is introduced. Four representative tools applied in the Alibaba Group are presented for the applications mentioned above. The value brought by multimedia content design combined with AI technology has been well verified in business through these tools. This reflects the great role played by AI technology in promoting the production of multimedia content. The application prospects of the combination of multimedia content design and AI are also indicated.

[1]  Trevor Darrell,et al.  Multi-content GAN for Few-Shot Font Style Transfer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Yuxin Peng,et al.  TPCKT: Two-Level Progressive Cross-Media Knowledge Transfer , 2019, IEEE Transactions on Multimedia.

[3]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Peng Wang,et al.  Joint Multi-person Pose Estimation and Semantic Part Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jonathan Tompson,et al.  PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[6]  Larry P. Heck,et al.  A Unit Selection Methodology for Music Generation Using Deep Neural Networks , 2016, ICCC.

[7]  Xin Huang,et al.  An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[10]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Peng Yuxin,et al.  Current Research Status and Prospects on Multimedia Content Understanding , 2019 .

[13]  Tao Mei,et al.  Unsupervised Person Image Generation With Semantic Parsing Transformation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Alex Zelinsky,et al.  Learning OpenCV---Computer Vision with the OpenCV Library (Bradski, G.R. et al.; 2008)[On the Shelf] , 2009, IEEE Robotics & Automation Magazine.

[15]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[16]  Carlo Tomasi,et al.  Features for Multi-target Multi-camera Tracking and Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Shifeng Zhang,et al.  S^3FD: Single Shot Scale-Invariant Face Detector , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  Dani Lischinski,et al.  A Closed-Form Solution to Natural Image Matting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Kwang-Shik Kim,et al.  Improved simple linear iterative clustering superpixels , 2013, 2013 IEEE International Symposium on Consumer Electronics (ISCE).

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Wen Gao,et al.  Cross-media analysis and reasoning: advances and directions , 2017, Frontiers of Information Technology & Electronic Engineering.