A co-occurrence based approach of automatic keyword expansion using mass diffusion

The performance of keyword expansion in prior methods is often enhanced by adopting external knowledge. Given a set of initial keywords, this paper is motivated to propose a novel method to expand semantically or conceptually related keywords from domain corpus by employing mass diffusion. A bipartite word network is thus constructed based on co-occurrence relations between initial keywords and candidate words. The expanded keywords are identified via two-step mass diffusion which is carried out in the bipartite network. Experimental results prove that the proposed method outperforms both the typical statistical-based approach and graph-based approach. Our research is expected to complement the theoretical framework of keyword expansion and is applicable to the scenarios of query expansion, thesaurus construction, and text clustering.

[1]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[2]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[3]  Saroj K. Biswas,et al.  A graph based keyword extraction model using collective node weight , 2018, Expert Syst. Appl..

[4]  Qiang Guo,et al.  Information filtering via biased heat conduction , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Mahmoud Al-Ayyoub,et al.  On the automatic construction of an Arabic thesaurus , 2018, 2018 9th International Conference on Information and Communication Systems (ICICS).

[6]  Vasudha Bhatnagar,et al.  sCAKE: Semantic Connectivity Aware Keyword Extraction , 2018, Inf. Sci..

[7]  Shang-Pin Ma,et al.  Web Service Discovery Using Lexical and Semantic Query Expansion , 2013, 2013 IEEE 10th International Conference on e-Business Engineering.

[8]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[9]  Keqing He,et al.  Web service discovery based on goal-oriented query expansion , 2018, J. Syst. Softw..

[10]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[11]  Mahdieh Mirzabeigi,et al.  Constructing an ontology based on a thesaurus: A case of ASIS&TOnto based on the ASIS&T Web-based thesaurus , 2018, Electron. Libr..

[12]  Grigorios Tsoumakas,et al.  Local word vectors guiding keyphrase extraction , 2018, Inf. Process. Manag..

[13]  CicekliIlyas,et al.  Using lexical chains for keyword extraction , 2007 .

[14]  Aditi Sharan,et al.  Keyword and Keyphrase Extraction Techniques: A Literature Review , 2015 .

[15]  Peter Willett,et al.  The limitations of term co-occurrence data for query expansion in document retrieval systems , 1991, J. Am. Soc. Inf. Sci..

[16]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[17]  Grigorios Tsoumakas,et al.  A review of keyphrase extraction , 2019, WIREs Data Mining Knowl. Discov..

[18]  Leandro Nunes de Castro,et al.  TKG: A Graph-Based Approach to Extract Keywords from Tweets , 2014, DCAI.

[19]  Lin Li,et al.  Improving Short Text Clustering Performance with Keyword Expansion , 2009, ISNN.

[20]  Ricardo Campos,et al.  A Text Feature Based Automatic Keyword Extraction Method for Single Documents , 2018, ECIR.

[21]  Arul Menezes,et al.  Social Text Normalization using Contextual Graph Random Walks , 2013, ACL.

[22]  Tan Qingping,et al.  A Graph-based Approach of Automatic Keyphrase Extraction , 2017 .

[23]  Iraklis Varlamis,et al.  A knowledge-based semantic framework for query expansion , 2019, Inf. Process. Manag..

[24]  Thad Hughes,et al.  Lexical Semantic Relatedness with Random Graph Walks , 2007, EMNLP.

[25]  Elizabeth Chang,et al.  Ontology-Based Support for Human Disease Study , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[26]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[27]  Sanda Martinčić-Ipšić,et al.  An Overview of Graph-Based Keyword Extraction Methods and Approaches , 2015 .

[28]  Joon-Sang Park,et al.  Utilizing context-relevant keywords extracted from a large collection of user-generated documents for music discovery , 2017, Inf. Process. Manag..

[29]  Evangelos E. Milios,et al.  A multi-centrality index for graph-based keyword extraction , 2019, Inf. Process. Manag..

[30]  Yi-Cheng Zhang,et al.  Solving the apparent diversity-accuracy dilemma of recommender systems , 2008, Proceedings of the National Academy of Sciences.

[31]  Eric Jui-Lin Lu,et al.  Finding keywords in blogs: Efficient keyword extraction in blog mining via user behaviors , 2014, Expert Syst. Appl..

[32]  Marco Morana,et al.  A framework for real-time Twitter data analysis , 2016, Comput. Commun..

[33]  Hui Xiong,et al.  Semantics-Based Automated Service Discovery , 2012, IEEE Transactions on Services Computing.

[34]  Ting Liu,et al.  Keywords extraction with deep neural network model , 2020, Neurocomputing.

[35]  Tao Li,et al.  Recommendation model based on opinion diffusion , 2007, ArXiv.

[36]  Leandro Nunes de Castro,et al.  A keyword extraction method from twitter messages represented as graphs , 2014, Appl. Math. Comput..

[37]  Xuanjing Huang,et al.  Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter , 2016, EMNLP.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[40]  M. Shamim Khan,et al.  Enhanced Web document retrieval using automatic query expansion , 2004, J. Assoc. Inf. Sci. Technol..

[41]  Kai Yang,et al.  Improved Automatic Keyword Extraction Given More Semantic Knowledge , 2016, DASFAA Workshops.

[42]  Cornelia Caragea,et al.  A Position-Biased PageRank Algorithm for Keyphrase Extraction , 2017, AAAI.

[43]  Gang Wang,et al.  TRECVID 2004 Search and Feature Extraction Task by NUS PRIS , 2004, TRECVID.

[44]  Bruno Martins,et al.  Automatic extraction of relevant keyphrases for the study of issue competition , 2019 .

[45]  Yi-Cheng Zhang,et al.  Bipartite network projection and personal recommendation. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[47]  Ying Sun,et al.  A new method for automatically constructing domain-oriented term taxonomy based on weighted word co-occurrence analysis , 2015, Scientometrics.

[48]  Ilyas Cicekli,et al.  Using lexical chains for keyword extraction , 2007, Inf. Process. Manag..

[49]  Byungjeong Lee,et al.  Building Concept Network-Based User Profile for Personalized Web Search , 2010, 2010 IEEE/ACIS 9th International Conference on Computer and Information Science.

[50]  Yejun Wu,et al.  Enriching a thesaurus as a better question-answering tool and information retrieval aid , 2018, J. Inf. Sci..

[51]  Akshay Deepak,et al.  Query Expansion Techniques for Information Retrieval: a Survey , 2017, Inf. Process. Manag..

[52]  Keping Li,et al.  A new network model for extracting text keywords , 2018, Scientometrics.

[53]  Jean-Charles Delvenne,et al.  Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks , 2014, IEEE Transactions on Network Science and Engineering.

[54]  Syed Waqar Jaffry,et al.  Textual keyword extraction and summarization: State-of-the-art , 2019, Inf. Process. Manag..

[55]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.