Exploiting entities in social media

Over the past couple of years micro blogging platforms, such as Twitter, have become extremely popular for information generation and dissemination. Each day hundreds of millions of tweets are being published, containing fresh and trending information that is highly valuable for online users. However, discovering relevant information from such sources is becoming harder due to their rapid growth and the fact that social fragments are often short and noisy. Aggregation techniques such as clustering are often used for extracting this relevant information, since interesting signals begin to emerge when these fragments are grouped together. Clustering large amount of short tweets with limited features is however a challenging task in itself. In this paper, we propose to aggregate tweets by pivoting on entities and mapping them to topics that are already defined in websites such as Wikipedia and Freebase. This allows us to aggregate tweets in a more reliable and feasible way while providing interesting aggregated information about entities present in these fragments. Our analysis using large amounts of tweets shows that such an approach indeed works well. We present encouraging results and various interesting applications centered on entities.