Statistical Translation Language Model for Twitter Search

With the prevalence of social media applications, an increasing number of internet users are actively publishing text information on-line. This influx provides a wealth of text information on those users. Ranking in social media poses different challenges than Web search ranking, one of which is that Microblog messages are really short. As a result, the vocabulary mismatch problem is exacerbated in social media search. In this paper, we first study the standard translation model for this problem and reveal that translation language model not only helps to bridge the vocabulary gap but also improves the estimate of Term Frequency. We further propose two ways to improve translation language model through leveraging Hashtag information and adaptively setting the self-translation parameter. Experimental results on Twitter data set show that our proposed methods are effective.