论文信息 - Exploiting Topical Perceptions over Multi-Lingual Text for Hashtag Suggestion on Twitter

Exploiting Topical Perceptions over Multi-Lingual Text for Hashtag Suggestion on Twitter

Microblogging websites, such as Twitter, provide seemingly endless amount of textual information on a wide variety of topics generated by a large number of users.Microblog posts, or tweets in Twitter, are often written in an informal manner using multi-lingual styles. Ignoring informal styles or multiple languages can hamper the usefulness of microblogging mining applications.In this paper, we present a statistical method for processing tweets according to users perceptions of topics and hashtags. Based on the non-classical notion of relatedness of vocabulary terms to topics in a corpus,which is quantified by discriminative term weights, our method builds a ranked list of terms related to hashtags.Subsequently, given a new tweet, our method can suggesta ranked list of hashtags. Our method allows enhanced understanding and normalization of users perceptionsfor improved information retrieval applications.We evaluate our method on a dataset of 14 million tweets collected over a period of 52 days. Results demonstrate that the method actually learns useful relationships between vocabulary terms and topics, and that the performance is better than a Naive Bayes suggestion system.

Hassan Foroosh | Fernando Gomez | Asim Karim | Amara Tariq

[1] Efthimis N. Efthimiadis,et al. Conversational tagging in twitter , 2010, HT '10.

[2] Yuefeng Li,et al. Microblog Retrieval Using Topical Features and Query Expansion , 2011, TREC.

[3] Graeme Hirst,et al. Non-Classical Lexical Semantic Relations , 2004, Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics - CLS '04.

[4] M. de Rijke,et al. Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts , 2011, ECIR.

[5] A. Mazzia. Suggesting Hashtags on Twitter , 2011 .

[6] Michael Halliday,et al. Cohesion in English , 1976 .

[7] Asim Karim,et al. A Robust Discriminative Term Weighting Based Linear Discriminant Method for Text Classification , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[8] Asim Karim,et al. Fast supervised feature extraction by term discrimination information pooling , 2011, CIKM '11.

[9] Philip Resnik,et al. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[10] Miles Efron,et al. Information search and retrieval in microblogs , 2011, J. Assoc. Inf. Sci. Technol..

[11] C. J. van Rijsbergen,et al. Learning semantic relatedness from term discrimination information , 2009, Expert Syst. Appl..