A novel keyphrase extraction method by combining FP-growth and LDA

Fast-growing technologies like cloud-computing, big data, mobile Internet, artificial intelligence, etc. have driven the emergences of a lot of new phrases. In this paper, we propose a novel keyphrases extraction method with two steps by combining FP-growth algorithm and Latent Dirichlet Allocation (LDA) topic modeling. In the first step, we apply FP-growth algorithm to obtain frequent neighborhood words co-occurring frequently as candidate phrases. In the second step, we extract significant keyphrases by LDA models. Our experiments on two datasets CVE-2015 and 20-newsgroups have shown that the proposed approach can extract significant keyphrases and these phrases can help improve the text classification accuracy.

[1]  Mounir Zrigui,et al.  Arabic Text Classification Framework Based on Latent Dirichlet Allocation , 2012, J. Comput. Inf. Technol..

[2]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[3]  Ting Liu,et al.  Open-categorical text classification based on multi-LDA models , 2015, Soft Comput..

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Katja Hofmann,et al.  The impact of document structure on keyphrase extraction , 2009, CIKM.

[6]  Pengtao Xie,et al.  Integrating Document Clustering and Topic Modeling , 2013, UAI.

[7]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[8]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .

[9]  Guido Zuccon,et al.  Is the unigram relevance model term independent?: classifying term dependencies in query expansion , 2012, ADCS.

[10]  Shan Liu,et al.  Extracting representative phrases from Wikipedia article sections , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).

[11]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[12]  R. C. Murphy,et al.  Phrase detection and the associative memory neural network , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[13]  Jin Liu,et al.  An improved LDA algorithm for text classification , 2014, 2014 International Conference on Information Science, Electronics and Electrical Engineering.

[14]  Biing-Hwang Juang,et al.  Flexible speech understanding based on combined key-phrase detection and verification , 1998, IEEE Trans. Speech Audio Process..

[15]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.