Learning boundaries of vague places from noisy annotations

What ordinary people mean by places may differ dramatically from what experts consider them to be. This is especially evident in how people talk about places in social media, where 'Los Angeles', for instance, could include areas well outside of the city or even in another county. In order to make best use of the information in social media, we need to understand what people mean when they refer to a place. Social annotations provide valuable evidence for harvesting knowledge about places, e.g., learning their boundaries and relations to other places. However, social annotations are noisy, and this can dramatically distort the learned boundaries. In this paper we propose a method that exploits the distinctive property of social annotations --- that it is created by many people --- to filter out noise. Using a large data set extracted from Flickr we show that our crowd-based noise filtering method can learn accurate boundaries of places, including vague places.