Geolocation with Subsampled Microblog Social Media

We propose a data-driven geolocation method on microblog text. Key idea underlying our approach is sparse coding, an unsupervised learning algorithm. Unlike conventional positioning algorithms, we geolocate a user by identifying features extracted from her social media text. We also present an enhancement robust to a random erasure of words in the text and report our experimental results with uniformly or randomly subsampled microblog text. Our solution features a novel two-step procedure consisting of upconversion and iterative refinement by joint sparse coding. As a result, we can reduce the computational cost of geolocation while preserving accuracy. In the light of information preservation and privacy, we remark potential applications of this paper.