Selective Classification of Danmaku Comments Using Distributed Representations

Danmaku commenting has become popular for co-viewing on video-sharing platforms. However, there are usually a large number of irrelevant comments, that contaminate the quality of the information provided by videos. To address this problem, this paper presents a novel approach of classifying Danmaku comments into video categories. Specifically, we use BERT as the backbone architecture to extract semantic features from comments. We introduce a loss function that has an abstention option, which enables the detection of comments that do not fall into any predefined category. The experiments that we conducted using Nicovideo data demonstrated that our selective classification approach effectively discarded those that were irrelevant to a video’s content. We also present a method for subdividing the existing video categories based on the results of Danmaku comment classification. This entails a potential application of our method in hierarchical video clustering.

[1]  Yen-Liang Chen,et al.  Emotion classification of YouTube videos , 2017, Decis. Support Syst..

[2]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Ahmed Abdelali,et al.  Arabic Offensive Language on Twitter: Analysis and Experiments , 2020, ArXiv.

[5]  Liang He,et al.  Entity-level sentiment prediction in Danmaku video interaction , 2021, The Journal of Supercomputing.

[6]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Ruslan Salakhutdinov,et al.  Deep Gamblers: Learning to Abstain with Portfolio Theory , 2019, NeurIPS.

[9]  Ran El-Yaniv,et al.  Selective Classification for Deep Neural Networks , 2017, NIPS.

[10]  Gan Keng Hoon,et al.  Term weighting scheme for short-text classification: Twitter corpuses , 2019, Neural Computing and Applications.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Wava Carissa Putri,et al.  ISWARA at WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets using BERT and FastText Embeddings , 2020, WNUT.

[13]  Ayu Purwarianti,et al.  Emotion classification on youtube comments using word embedding , 2017, 2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA).

[14]  Erik M. Fredericks,et al.  Uncertainty in big data analytics: survey, opportunities, and challenges , 2019, Journal of Big Data.

[15]  Yue Chen,et al.  Watching a Movie Alone yet Together: Understanding Reasons for Watching Danmaku Videos , 2017, Int. J. Hum. Comput. Interact..