Leveraging collaborative tagging for web item design

The popularity of collaborative tagging sites has created new challenges and opportunities for designers of web items, such as electronics products, travel itineraries, popular blogs, etc. An increasing number of people are turning to online reviews and user-specified tags to choose from among competing items. This creates an opportunity for designers to build items that are likely to attract desirable tags when published. In this paper, we consider a novel optimization problem: given a training dataset of existing items with their user-submitted tags, and a query set of desirable tags, design the k best new items expected to attract the maximum number of desirable tags. We show that this problem is NP-Complete, even if simple Naive Bayes Classifiers are used for tag prediction. We present two principled algorithms for solving this problem: (a) an exact "two-tier" algorithm (based on top-k querying techniques), which performs much better than the naive brute-force algorithm and works well for moderate problem instances, and (b) a novel polynomial-time approximation algorithm with provable error bound for larger problem instances. We conduct detailed experiments on synthetic and real data crawled from the web to evaluate the efficiency and quality of our proposed algorithms.

[1]  Sönke Albers,et al.  Optimal Product Attributes in Single Choice Models , 1980 .

[2]  Davide Martinenghi,et al.  Rank-Join Algorithms for Search Computing , 2009, SeCO Workshop.

[3]  Patrick R. McMullen,et al.  Optimal product design using a colony of virtual ants , 2007, Eur. J. Oper. Res..

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[6]  Jianchang Mao,et al.  Towards the Semantic Web: Collaborative Tag Suggestions , 2006 .

[7]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[8]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[9]  Heikki Mannila,et al.  Determining Attributes to Maximize Visibility of Objects , 2009, IEEE Transactions on Knowledge and Data Engineering.

[10]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[11]  Brian D. Davison,et al.  A probabilistic model for personalized tag prediction , 2010, KDD.

[12]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[13]  Allan D. Shocker,et al.  A Consumer-Based Methodology for the Identification of New Product Ideas , 1974 .

[14]  Ted Selker,et al.  Context-aware design and interaction in computer systems , 2000, IBM Syst. J..

[15]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[16]  Gilad Mishne,et al.  AutoTag: a collaborative approach to automated tag assignment for weblog posts , 2006, WWW '06.