论文信息 - Keyphrase Extraction and Grouping Based on Association Rules

Keyphrase Extraction and Grouping Based on Association Rules

Keyphrases are important in capturing the content of a document and thus useful for many natural language processing tasks such as Information Retrieval, Document Classification, and Text Summarization. Keyphrase extraction aims to identify multi-word sequences from a collection of documents that more or less correspond to keyphrases. In this paper, we propose a new method for keyphrase extraction based on association rule mining. Redundant multi-word sequences or synonymous phrases inevitably make up a big part of the keyphrases extracted. With association rules, we can also reduce the redundancy by grouping the related keyphrases that have strong co-occurrence frequencies. We further apply our keyphrase extraction and grouping solution to Information Retrieval. By both distinguishing and grouping keyphrases, we are able to achieve improved performance for Information Retrieval.

Xin Li | Fei Song | Fei Song | Xin Li

[1] Simone Teufel,et al. An Overview of Evaluation Methods in TREC Ad Hoc Information Retrieval and TREC Question Answering , 2007 .

[2] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[3] Hatem Haddad,et al. Towards an effective automatic query expansion process using an association rule mining approach , 2012, Journal of Intelligent Information Systems.

[4] Simone Paolo Ponzetto,et al. WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[5] B. Magnini,et al. Keyphrase Extraction for Summarization Purposes : The LAKE System at DUC-2004 , 2004 .

[6] Carl Gutwin,et al. Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[7] David Crystal,et al. A dictionary of linguistics and phonetics , 1997 .

[8] W. Bruce Croft,et al. A general language model for information retrieval , 1999, CIKM '99.

[9] B. Magnini,et al. A Keyphrase-Based Approach to Summarization : the LAKE System at DUC-2005 , 2005 .

[10] Johannes Gehrke,et al. Sequential PAttern mining using a bitmap representation , 2002, KDD.

[11] Jonathan D. Cohen. Highlights: language- and domain-independent automatic indexing terms for abstracting , 1995 .

[12] Mohamed S. Kamel,et al. CorePhrase: Keyphrase Extraction for Document Clustering , 2005, MLDM.

[13] Ying Zhang,et al. Mining Key Phrase Translations from Web Corpora , 2005, HLT.

[14] Eduard H. Hovy,et al. Question Answering in Webclopedia , 2000, TREC.

[15] Donna K. Harman,et al. Overview of the Fifth Text REtrieval Conference (TREC-5) , 1996, TREC.

[16] Kamel Smaïli,et al. Mining monolingual and bilingual corpora , 2010, Intell. Data Anal..

[17] Dilek Z. Hakkani-Tür,et al. A keyphrase based approach to interactive meeting summarization , 2008, 2008 IEEE Spoken Language Technology Workshop.

[18] Lee-Feng Chien,et al. PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[19] Enrico Blanzieri,et al. Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing , 2010, ICADL.

[20] Clement T. Yu,et al. A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[21] Atsushi Imiya,et al. Machine Learning and Data Mining in Pattern Recognition , 2013, Lecture Notes in Computer Science.

[22] Bharath Dandala. Graph-Based Keyphrase Extraction Using Wikipedia , 2010 .

[23] Weiwei Huo. Automatic Multi-word Term Extraction and its Application to Web-page Summarization , 2012 .

[24] Laurent Romary,et al. HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID , 2010, *SEMEVAL.

[25] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[26] Juan-Zi Li,et al. Keyword Extraction Using Support Vector Machine , 2006, WAIM.

[27] Hans Peter Luhn,et al. A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[28] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[29] Mohammed J. Zaki. Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[30] Bing Liu,et al. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[31] Frans Coenen,et al. Statistical Identification of Key Phrases for Text Classification , 2007, MLDM.

[32] Yi-fang Brook Wu,et al. Domain-specific keyphrase extraction , 2005, CIKM '05.

[33] Peter D. Turney. Learning to Extract Keyphrases from Text , 2002, ArXiv.

[34] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .

[35] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[36] Didier Bourigault,et al. Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases , 1992, COLING.

[37] Vibhu O. Mittal,et al. OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[38] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[39] Li Su. Research on Maximum Entropy Model for Keyword Indexing , 2004 .

[40] Djoerd Hiemstra,et al. Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002 , 2003, SIGF.

[41] Iadh Ounis,et al. Automatically Building a Stopword List for an Information Retrieval System , 2005, J. Digit. Inf. Manag..

[42] Jiawei Han,et al. BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[43] Roland Kuhn,et al. Phrase Clustering for Smoothing TM Probabilities - or, How to Extract Paraphrases from Phrase Tables , 2010, COLING.

[44] José Gabriel Pereira Lopes,et al. Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units , 1999, EPIA.

[45] Mitsuru Ishizuka,et al. Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[46] Chi-Hong Leung,et al. A Statistical Learning Approach to Automatic Indexing of Controlled Index Terms , 1997, J. Am. Soc. Inf. Sci..

[47] Christian Wartena,et al. Keyword Extraction Using Word Co-occurrence , 2010, 2010 Workshops on Database and Expert Systems Applications.

[48] Chengqi Zhang,et al. Post-mining of Association Rules: Techniques for Effective Knowledge Extraction , 2009 .

[49] Chiu-yu Tseng,et al. Modeling Prosody of Mandarin Chinese Fluent Speech via Phrase Grouping , 2004 .

[50] Amy J. C. Trappey,et al. Development of a patent document classification and search platform using a back-propagation network , 2006, Expert Syst. Appl..

[51] John D. Lafferty,et al. A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2017, SIGF.

[52] Peter D. Turney. Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[53] Jiawei Han,et al. Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[54] W. Bruce Croft,et al. A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[55] Dekang Lin,et al. Phrase Clustering for Discriminative Learning , 2009, ACL.

[56] Ilyas Cicekli,et al. Using lexical chains for keyword extraction , 2007, Inf. Process. Manag..

[57] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[58] StummeGerd,et al. Computing iceberg concept lattices with TITANIC , 2002 .

[59] Anette Hulth,et al. Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[60] Xiaojun Wan,et al. Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[61] Oren Etzioni,et al. Clustering web documents: a phrase-based method for grouping search engine results , 1999 .

[62] Julien Velcin,et al. Topic Extraction for Ontology Learning , 2011 .

[63] Xiaojun Wan,et al. Exploiting neighborhood knowledge for single document summarization and keyphrase extraction , 2010, TOIS.

[64] Y. Wang,et al. Various Approaches in Text Pre-processing , 2004 .

[65] Carl Gutwin,et al. KEA: practical automatic keyphrase extraction , 1999, DL '99.

[66] CHENGXIANG ZHAI,et al. A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[67] Gerd Stumme,et al. Conceptual Clustering with Iceberg Concept Lattices , 2001 .

[68] W. Bruce Croft,et al. The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[69] Evelyne Tzoukermann,et al. Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax , 1997, ACL.

[70] L. R. Rasmussen,et al. In information retrieval: data structures and algorithms , 1992 .

[71] Joaquim Ferreira da Silva. Extracting Multiword Terms from Document Collections , 1999 .

[72] Bernhard Ganter,et al. Formal Concept Analysis: Mathematical Foundations , 1998 .

[73] Gerard Salton,et al. Automatic Information Organization And Retrieval , 1968 .

[74] Ken Barker,et al. Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.

[75] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[76] Torsten Zesch,et al. Study of semantic relatedness of words using collaboratively constructed semantic resources , 2010 .

[77] Rada Mihalcea,et al. PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[78] Ian H. Witten,et al. Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).