Estimating Age on Twitter Using Self-Training Semi-Supervised SVM

The estimation methods for Twitter user’s attributes typically require a vast amount of labeled data. Therefore, an efficient way is to tag the unlabeled data and add it to the set. We applied the self-training SVM as a semisupervised method for age estimation and introduced Plat scaling as the unlabeled data selection criterion in the self-training process. We show how the performance of the self-training SVM varies when the amount of training data and the selection criterion values are changed.