Localizing Web Videos from Heterogeneous Images

While geo-localization of web images has been widely studied, limited effort is devoted to that of web videos. Nevertheless, an accurate location inference approach specified on web videos is of fundamental importance, as it's occupying increasing proportions in web corpus. The key challenge comes from the lack of sufficient labels for model training. In this paper, we tackle this problem from a novel perspective, by "transferring" the large-scale web images with geographical tags to web videos, to make a carefully designed associations between visual content similarities. A group of experiments are conducted on a collected web image and video data set, where superior performance gains are reported over several alternatives.