Human-competitive tagging using automatic keyphrase extraction

This paper connects two research areas: automatic tagging on the web and statistical keyphrase extraction. First, we analyze the quality of tags in a collaboratively created folksonomy using traditional evaluation techniques. Next, we demonstrate how documents can be tagged automatically with a state-of-the-art keyphrase extraction algorithm, and further improve performance in this new domain using a new algorithm, "Maui", that utilizes semantic information extracted from Wikipedia. Maui outperforms existing approaches and extracts tags that are competitive with those assigned by the best performing human taggers.

[1]  Valentin Robu,et al.  The complex dynamics of collaborative tagging , 2007, WWW '07.

[2]  Lawrence Birnbaum,et al.  TagAssist: Automatic Tag Suggestion for Blog Posts , 2007, ICWSM.

[3]  Kurt Leininger,et al.  Interindexer consistency in PsycINFO , 2000, J. Libr. Inf. Sci..

[4]  Jianchang Mao,et al.  Towards the Semantic Web: Collaborative Tag Suggestions , 2006 .

[5]  Christopher H. Brooks,et al.  Improved annotation of the blogosphere via autotagging and hierarchical clustering , 2006, WWW '06.

[6]  Karl Aberer,et al.  To tag or not to tag -: harvesting adjacent metadata in large-scale tagging systems , 2008, SIGIR '08.

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Anette Hulth Combining Machine Learning and Natural Language Processing for Automatic Keyword Extraction , 2004 .

[9]  Lawrence E. Leonard,et al.  Inter-Indexer Consistency and Retrieval Effectiveness: Measurement of Relationships , 1975 .

[10]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[11]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[12]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[13]  Loll N. Rolling Indexing consistency, quality and efficiency , 1981, Inf. Process. Manag..

[14]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[15]  Ian H. Witten,et al.  Topic indexing with Wikipedia , 2008 .

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[18]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[19]  Gilad Mishne,et al.  AutoTag: a collaborative approach to automated tag assignment for weblog posts , 2006, WWW '06.

[20]  Siegfried Handschuh,et al.  P-TAG: large scale automatic generation of personalized annotation tags for the web , 2007, WWW '07.