A self-training method for learning to rank with unlabeled data

This paper presents a new algorithm for bipartite ranking functions trained with partially labeled data. The algorithm is an extension of the self-training paradigm developed under the classification framework. We further propose an efficient and scalable optimization method for training linear models though the approach is general in the sense that it can be applied to any classes of scoring functions. Empirical results on several common image and text corpora over the Area Under the ROC Curve (AUC) and the Average Precision measure show that the use of unlabeled data in the training process leads to improve the performance of baseline supervised ranking functions.