Semantically Enriched Clustered User Interest Profile Built from Users' Tweets

Existing works in user profiling suffers from two well known problems in IR: polysemy and synonymy. Enriching semantics to terms that represent user interests disambiguate it’s context, polysemous topics, and synonyms. One way of enriching semantics to terms is by grouping related terms together into clusters. This work exploits users’ tweets to build a Contextualized User Interest Profile(CUIP) that consist of clusters of (semantically) related terms and their term-weights. We propose two approaches to build the CUIP: svdCUIP based on Singular Value Decomposition (SVD); and, modsvdCUIP based on modded SVD (modSVD). We run experiments to determine the appropriate value of various parameters required for building CUIP, and also run experiments to compare the two proposed approaches in terms of clustering accuracy and clustering tendency. Results show that the clustering tendency and accuracy of the cluster structure modsvdCUIP is superior than the svdCUIP.