A Solution to Multi-modal Ads Video Tagging Challenge

In this paper, we present our solution to the Multi-modal Ads Video Tagging Challenge of Tencent Advertising Algorithm Competition in ACM Multimedia 2021 Grand Challenges. We extend the baseline model by redesigning the visual feature extraction procedure and we modify the loss function to cope with sparse positive targets. Moreover, we propose Semi-supervised Learning with Negative Masking to leverage both labeled data and unlabeled data from the preliminary contest which effectively enhances the training process. We further utilize Cross-Class Relevance Learning to boost the performance. We achieve 0.8237 GAP score via model ensemble and rank the second place among all submissions in the challenge.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Klaus H. Maier-Hein,et al.  Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge , 2017, BrainLes@MICCAI.

[3]  Yanfeng Wang,et al.  Collaborative Label Correction via Entropy Thresholding , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[4]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[5]  Konstantinos Kamnitsas,et al.  Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation , 2017, BrainLes@MICCAI.

[6]  Lihi Zelnik-Manor,et al.  Asymmetric Loss For Multi-Label Classification , 2020, ArXiv.

[7]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Klaus H. Maier-Hein,et al.  No New-Net , 2018, 1809.10483.

[9]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Chang Liu,et al.  Automatic Semantic Segmentation of Brain Gliomas from MRI Images Using a Deep Cascaded Neural Network , 2018, Journal of healthcare engineering.

[11]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[12]  Hao Wu,et al.  Cooperative Learning for Noisy Supervision , 2021, 2021 IEEE International Conference on Multimedia and Expo (ICME).

[13]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[20]  Heng Wang,et al.  Video Classification With Channel-Separated Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Maksims Volkovs,et al.  Cross-Class Relevance Learning for Temporal Concept Localization , 2019, ArXiv.

[22]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[23]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[24]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[25]  Jianping Fan,et al.  NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification , 2018, ECCV Workshops.

[26]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Zhiqiang He,et al.  Modality-Pairing Learning for Brain Tumor Segmentation , 2020, ArXiv.