论文信息 - Improving Free Text Recommendation Time by Means of Clustering Algorithms

Improving Free Text Recommendation Time by Means of Clustering Algorithms

In this paper, we study the effects of applying clustering algorithms to free text recommendation systems. Usually recommendation systems do not scale well as the size of the recommendation space grows. One of the main techniques to scale them is by applying clustering, however clustering usually have a negative impact on the accuracy when applied without taking into consideration the recommended items. We construct a simple recommendation system for docu- ments and propose partition its search space using kMeans. We vary the number of clusters and analyze how it affects per- formance in relation of recommendation time and accuracy. We apply a word-embedding-based technique to represent the document’s bag-of-words, and therefore be able to compare how clustering algorithms performs in the task of partitioning these documents. One of the main findings of this work is that using clustering we can improve the recommendation time in almost 4 times without losing much off its initial accuracy. Another interesting finding is that the increment of the number of clusters is not directly translated into linear performance.

Akira Fukuda | Kazuaki Murakami | Antoine Trouve | Israel Mendonca dos Santos

[1] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[2] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[3] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4] Oren Etzioni,et al. Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[5] Wei-Ying Ma,et al. Learning to cluster web search results , 2004, SIGIR '04.

[6] N. Altman. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[7] Marti A. Hearst,et al. Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[8] Koichi Takeda,et al. Information retrieval on the web , 2000, CSUR.

[9] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[10] David R. Karger,et al. Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.