Extending FolkRank with content data

Real-world tagging datasets have a large proportion of new/ untagged documents. Few approaches for recommending tags to a user for a document address this new item problem, concentrating instead on artificially created post-core datasets where it is guaranteed that the user as well as the document of each test post is known to the system and already has some tags assigned to it. In order to recommend tags for new documents, approaches are required which model documents not only based on the tags assigned to them in the past (if any), but also the content. In this paper we present a novel adaptation to the widely recognised FolkRank tag recommendation algorithm by including content data. We adapt the FolkRank graph to use word nodes instead of document nodes, enabling it to recommend tags for new documents based on their textual content. Our adaptations make FolkRank applicable to post-core 1 ie. the full real-world tagging datasets and address the new item problem in tag recommendation. For comparison, we also apply and evaluate the same methodology of including content on a simpler tag recommendation algorithm. This results in a less expensive recommender which suggests a combination of user related and document content related tags. Including content data into FolkRank shows an improvement over plain FolkRank on full tagging datasets. However, we also observe that our simpler content-aware tag recommender outperforms FolkRank with content data. Our results suggest that an optimisation of the weighting method of FolkRank is required to achieve better results.