A Compara t ive Study on Key Phrase Ex t rac t ion

Web Site Summarization is the process af automatically generating a concise and informative summary for a given Web site. It has gained more and more a t t en t i on i n recen t Yea rs as e f f ec t i ve summarization could lead to enhanced Web information retrieval systems such as searching for Web sites. E x t r a c t i o n b a s e d a p p r o a c h e s t o W e b s i f e summarization rely on the extraction of the most significant sentences from the target Web site based on the density of a list of key phrases that best describe the entire Web site. In this wark, we benchmark five alternative key phrase extraction methods, TFIDF, KEA, Keyword, Keyterm, and Mixture, in an automatic Web site summarization framework we previously developed. We investigate the performance af these underlying methods via a farmal user study and demonstrate that Keyterm is the best chaice for key phrase extraction while Mixture shauld be used to obtain key sentences. We also discuss why one method performs better than another and what could be done to further improve the summarizatian system. Categories and Subj ect Descr iptors H.3.1 [Contart Analydsand Indexing]; Linguistic processing | .2.7 [Natural L anguage Processi ng] General Terms Web contcnt analysis. Web page slttttmarization, Web informllt ir-rn t c l r i r ' \ i . r l \ \ \ len l \ cr rnt r 'n t l rgg rcgal i r r t t

[1]  Carl Eklund,et al.  National Institute for Standards and Technology , 2009, Encyclopedia of Biometrics.

[2]  Evangelos E. Milios,et al.  Node similarity in the citation graph , 2006, Knowledge and Information Systems.

[3]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[4]  Julia Hirschberg,et al.  Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization , 2005 .

[5]  Evangelos E. Milios,et al.  Term-Based Clustering and Summarization of Web Page Collections , 2004, Canadian Conference on AI.

[6]  E. Milios,et al.  A Comparison of Keyword-and Keyterm-based Methods for Automatic Web Site Summarization , 2004 .

[7]  Gordon W. Paynter,et al.  An Evaluation of Document Keyphrase Sets , 2003, J. Digit. Inf..

[8]  Bernadette Bouchon-Meunier,et al.  Enhanced web document summarization using hyperlinks , 2003, HYPERTEXT '03.

[9]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[10]  Evangelos E. Milios,et al.  AUTOMATIC TERM EXTRACTION AND DOCUMENT SIMILARITY IN SPECIAL TEXT CORPORA , 2003 .

[11]  Peter D. Turney Extraction of Keyphrases from Text: Evaluation of Four Algorithms , 2002, ArXiv.

[12]  Peter D. Turney Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data , 2002, ArXiv.

[13]  Gordon W. Paynter,et al.  Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications , 2002, J. Assoc. Inf. Sci. Technol..

[14]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[15]  Hirosi Nakagawa Experimental Evaluation of Ranking and Selection Methods in Term Extraction , 2001 .

[16]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[17]  Jade Goldstein-Stewart,et al.  Creating and evaluating multi-document sentence extract summaries , 2000, CIKM '00.

[18]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[19]  Jihoon Yang,et al.  Extracting sentence segments for text summarization: a machine learning approach , 2000, SIGIR '00.

[20]  Carl Gutwin,et al.  Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[21]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[22]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[23]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[24]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[25]  Bruce Krulwich,et al.  Learning user information interests through extraction of semantically significant phrases , 1996 .

[26]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[27]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .