Concept extraction and e-commerce applications

Abstract Concept extraction is the technique of mining the most important topic of a document. In the e-commerce context, concept extraction can be used to identify what a shopping related Web page is talking about. This is practically useful in applications like search relevance and product matching. In this paper, we investigate two concept extraction methods: Automatic Concept Extractor (ACE) and Automatic Keyphrase Extraction (KEA). ACE is an unsupervised method that looks at both text and HTML tags. We upgrade ACE into Improved Concept Extractor (ICE) with significant improvements. KEA is a supervised learning system. We evaluate the methods by comparing automatically generated concepts to a gold standard. The experimental results demonstrate that ICE significantly outperforms ACE and also outperforms KEA in concept extraction. To demonstrate the practical use of concept extraction in the e-commerce context, we use ICE and KEA to showcase two e-commerce applications, i.e. product matching and topic-based opinion mining.

[1]  Peter D. Turney Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data , 2002, ArXiv.

[2]  Martin Ester,et al.  Aspect-based opinion mining from product reviews , 2012, SIGIR '12.

[3]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[4]  E. Milios,et al.  A Comparison of Keyword-and Keyterm-based Methods for Automatic Web Site Summarization , 2004 .

[5]  Bruce Krulwich,et al.  Learning user information interests through extraction of semantically significant phrases , 1996 .

[6]  Evangelos E. Milios,et al.  Narrative text classification for automatic key phrase extraction in web document corpora , 2005, WIDM '05.

[7]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[8]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[9]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[10]  Min Song,et al.  KPSpotter: a flexible information gain-based keyphrase extraction system , 2003, WIDM '03.

[11]  Julia Hirschberg,et al.  Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization , 2005 .

[12]  Aditya G. Parameswaran,et al.  Towards the web of concepts , 2010, Proc. VLDB Endow..

[13]  Chris Mattmann,et al.  ACE: improving search engines via Automatic Concept Extraction , 2004, Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, 2004. IRI 2004..

[14]  Evangelos E. Milios,et al.  A Comparison of Word- and Term-based Methods for Automatic Web Site Summarization , 2004 .

[15]  Jon Whittle,et al.  Evaluating Tools for Automatic Concept Extraction: a Case Study from the Musicology Domain , 2010 .

[16]  Carl Gutwin,et al.  Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[17]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[18]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[19]  Evangelos E. Milios,et al.  A Comparative Study on Key Phrase Extraction Methods in Automatic Web Site Summarization , 2007, J. Digit. Inf. Manag..

[20]  Rafael A. Calvo,et al.  Concept Extraction from Student Essays, Towards Concept Map Mining , 2009, 2009 Ninth IEEE International Conference on Advanced Learning Technologies.