Video annotation by graph-based learning with neighborhood similarity

Graph-based semi-supervised learning methods have been proven effective in tackling the difficulty of training data insufficiency in many practical applications such as video annotation. These methods are all based on an assumption that the labels of similar samples are close. However, as a crucial factor of these algorithms, the estimation of pairwise similarity has not been sufficiently studied. Usually, the similarity of two samples is estimated based on the Euclidean distance between them. But we will show that similarities are not merely related to distances but also related to the structures around the samples. It is shown that distance-based similarity measure may lead to high classification error rates even on several simple datasets. In this paper we propose a novel neighborhood similarity measure, which simultaneously takes into account both thse distance between samples and the difference between the structures around the corresponding samples. Experiments on synthetic dataset and TRECVID benchmark demonstrate that the neighborhood similarity is superior to existing distance based similarity.