Automatic keyphrase extraction and ontology mining for content-based tag recommendation

Collaborative tagging represents for the Web a potential way for organizing and sharing information and for heightening the capabilities of existing search engines. However, because of the lack of automatic methodologies for generating the tags and supporting the tagging activity, many resources on the Web are deficient in tag information, and recommending opportune tags is both a current open issue and an exciting challenge. This paper approaches the problem by applying a combined set of techniques and tools (that uses tags, domain ontologies, keyphrase extraction methods) thereby generating tags automatically. The proposed approach is implemented in the PIRATES (Personalized Intelligent tag Recommender and Annotator TEStbed) framework, a prototype system for personalized content retrieval, annotation, and classification. A case study application is developed using a domain ontology for software engineering. © 2010 Wiley Periodicals, Inc.

[1]  Chen Wang,et al.  Experiment Research on Feature Selection and Learning Method in Keyphrase Extraction , 2009, ICCPOL.

[2]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[3]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[4]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[5]  F. Ren,et al.  Multilingual single document keyword extraction for information retrieval , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[6]  Ilyas Cicekli,et al.  Using lexical chains for keyword extraction , 2007, Inf. Process. Manag..

[7]  Yi-fang Brook Wu,et al.  Document keyphrases as subject metadata: incorporating document key concepts in search results , 2008, Information Retrieval.

[8]  Gjpm Geert-Jan Houben,et al.  Metadata-based access to cultural heritage collections: the RHCe use case , 2008 .

[9]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[10]  Mohamed S. Kamel,et al.  CorePhrase: Keyphrase Extraction for Document Clustering , 2005, MLDM.

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  Jane Hunter,et al.  Harvana: harvesting community tags to enrich collection metadata , 2008, JCDL '08.

[13]  Antonina Dattolo,et al.  Recommending New Tags Using Domain-Ontologies , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[14]  Zhi Zhou,et al.  Keyphrase Extraction Using Semantic Networks Structure Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  V. Loia,et al.  An Ontological Approach for Memetic Optimization in Personalised E-Learning Scenarios , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[16]  Andrea D'Andrea,et al.  Encoding Cultural Heritage Information for the Semantic WEB. Procedures for Data Integration through CIDOC-CRM Mapping , 2011 .

[17]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[18]  Joongmin Choi,et al.  Web Document Clustering by Using Automatic Keyphrase Extraction , 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops.

[19]  Fabio Vitali,et al.  Towards Disambiguating Social Tagging Systems , 2010 .

[20]  Jesús Contreras,et al.  Cantabria Cultural Heritage Semantic Portal , 2007, Semantic Web Challenge.

[21]  Jing-Song Hu,et al.  Automatic Keyphrases Extraction from Document Using Neural Network , 2005, ICMLC.

[22]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[23]  Anette Hulth,et al.  A Study on Automatically Extracted Keywords in Text Categorization , 2006, ACL.

[24]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[25]  Antonina Dattolo,et al.  Toward Semantic Digital Libraries: Exploiting Web 2.0 and Semantic Services in Cultural Heritage , 2009, J. Digit. Inf..

[26]  Gordon W. Paynter,et al.  Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications , 2002, J. Assoc. Inf. Sci. Technol..

[27]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[28]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[29]  Sebastian Ryszard Kruk,et al.  JeromeDL - a Semantic Digital Library , 2007, Semantic Web Challenge.

[30]  Ian H. Witten,et al.  Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[31]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[32]  Sebastian Ryszard Kruk,et al.  Semantic Digital Libraries , 2009, Semantic Digital Libraries.

[33]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[34]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[35]  Antonina Dattolo,et al.  A General Framework for Personalized Text Classification and Annotation , 2009, AP WEB 2.0@UMAP.

[36]  Matteo Gaeta,et al.  Exploring e-Learning Knowledge Through Ontological Memetic Agents , 2010, IEEE Computational Intelligence Magazine.

[37]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[38]  Gilad Mishne,et al.  AutoTag: a collaborative approach to automated tag assignment for weblog posts , 2006, WWW '06.

[39]  Wen Gao,et al.  PKU at ImageCLEF 2008: Experiments with Query Extension Techniques for Text-Based and Content-Based Image Retrieval , 2008, CLEF.

[40]  V. Loia,et al.  A multi-layered agent ontology system for resource inventory , 2008, 2008 IEEE International Symposium on Industrial Electronics.

[41]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[42]  Min Song,et al.  Keyphrase extraction-based query expansion in digital libraries , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[43]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[44]  Marvin Minsky,et al.  Semantic Information Processing , 1968 .

[45]  Dana J. Vanier,et al.  Use of Keyphrase Extraction Software for Creation of an AEC/FM Thesaurus , 2000, J. Inf. Technol. Constr..

[46]  Antonina Dattolo,et al.  Towards Bridging the Gap between Personalization and Information Extraction , 2008, IRCDL.

[47]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[48]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[49]  K. Srinathan,et al.  Automatic keyphrase extraction from scientific documents using N-gram filtration technique , 2008, ACM Symposium on Document Engineering.

[50]  Pasquale Lops,et al.  STaR: a Social Tag Recommender System , 2009, DC@PKDD/ECML.

[51]  Ken Barker,et al.  Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.