Microblogging Hash Tag Recommendation System Based on Semantic TF-IDF: Twitter Use Case

Limitation in the number of characters in microblogging systems, such as Twitter, forces users to use various terms for the same meaning, object, or concept. Sometimes the same term is used in a shorter form (e.g. #friend and #frnd) in a tweet. This problem makes finding similarities between such tags and their corresponding tweets harder. The classical text mining methods cannot be used efficiently in the short tweets. Thus tweets similarity and subsequently tag recommendation, as one of the problems in microblogging social networks, needs a new method with higher efficiency. In this paper we have defined a new semantic based method to find similarities among short messages. We have modeled each short message as a semantic vector which can be used along with any similarity method such as cosine similarity. Then we evaluated the accuracy of the new semantic similarity based tag recommendation system using various semantic based algorithms and compare their results. The semantic based algorithms used are: Shortest Path, Wu & Palmer, Lin, JiangConrath, Resnik, Lesk, LeacockChodorow, and Hirst-StOnge. Results are evaluated using 8396744 real English tweets and show around 6 times improvement in accuracy over normal TF-IDF.

[1]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[2]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[3]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[4]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[5]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[6]  Chin-Sheng Yang,et al.  Enhancing personalized recommendation in social tagging systems by tag expansion , 2014, 2014 International Conference on Information Science, Electronics and Electrical Engineering.

[7]  Daniela Godoy,et al.  Leveraging Semantic Similarity for Folksonomy-Based Recommendation , 2014, IEEE Internet Computing.

[8]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[9]  Scott A. Wallace,et al.  Design and evaluation of a Twitter hashtag recommendation system , 2014, IDEAS.

[10]  Paolo Ferragina,et al.  On Analyzing Hashtags in Twitter , 2015, ICWSM.

[11]  Victor Lavrenko,et al.  Predicting social-tags for cold start book recommendations , 2009, RecSys '09.

[12]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[13]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[14]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[15]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[16]  Christiane Fellbaum,et al.  Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms , 1998 .

[17]  Eva Zangerle,et al.  On the impact of text similarity functions on hashtag recommendations in microblogging environments , 2013, Social Network Analysis and Mining.

[18]  Rui Li,et al.  Survey on social tagging techniques , 2010, SKDD.

[19]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[20]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[21]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .