Data Augmentation Based on Distributed Expressions in Text Classification Tasks

We propose a data augmentation method that combines Doc2vec and Label spreading in text classification tasks. The feature of our approach is the use of unlabeled samples, which are easier to obtain than labeled samples. We use them as an aid to the classification model to improve the accuracy of its prediction. We used this method to classify several text data sets including the natural language branch of the AIWolf contest. As a result of the experiments, we confirmed that the prediction accuracy is improved by applying our proposed method.