Improving keyword extraction in multilingual texts

The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively.

[1]  Nadhir Ibrahim Abdulkhaleq,et al.  Improving the data recovery for short length LT codes , 2020 .

[2]  Sung-Sam Hong,et al.  The Feature Selection Method based on Genetic Algorithm for Efficient of Text Clustering and Text Classification , 2015 .

[3]  Ilyas Cicekli,et al.  Using lexical chains for keyword extraction , 2007, Inf. Process. Manag..

[4]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[5]  Paolo Tonella,et al.  Using keyword extraction for Web site clustering , 2003, Fifth IEEE International Workshop on Web Site Evolution, 2003. Theme: Architecture. Proceedings..

[6]  Paolo Nesi,et al.  A Distributed Framework for NLP-Based Keyword and Keyphrase Extraction From Web Pages and Documents , 2015, DMS.

[7]  Michalis Vazirgiannis,et al.  Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction , 2015, ECIR.

[8]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[9]  Vasudha Bhatnagar,et al.  sCAKE: Semantic Connectivity Aware Keyword Extraction , 2018, Inf. Sci..

[10]  Sang-Won Lee,et al.  A survey of Flash Translation Layer , 2009, J. Syst. Archit..

[11]  Vishal Jain,et al.  Ontology Based Information Retrieval Model in Semantic Web: A Review , 2014 .

[12]  Pasquale De Meo,et al.  Web Data Extraction , Applications and Techniques : A Survey , 2010 .

[13]  Hwan-Gue Cho,et al.  A New Extraction Algorithm for Hierarchical Keyword Using Text Social Network , 2016 .

[14]  Maryam Habibi,et al.  Keyword Extraction and Clustering for Document Recommendation in Conversations , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Saroj K. Biswas,et al.  A graph based keyword extraction model using collective node weight , 2018, Expert Syst. Appl..

[16]  Mohamed Nazih Omri,et al.  Complex Terminology Extraction Model from Unstructured Web Text Based Linguistic and Statistical Knowledge , 2012, Int. J. Inf. Retr. Res..

[17]  Zoran Budimac,et al.  A language-independent approach to the extraction of dependencies between source code entities , 2014, Inf. Softw. Technol..

[18]  Rui Wang,et al.  A Two-Level Keyphrase Extraction Approach , 2015, CICLing.

[19]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[20]  Jonghun Park,et al.  Language independent semantic kernels for short-text classification , 2014, Expert Syst. Appl..

[21]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[22]  Divya Kumar,et al.  Feature Extraction Methods in Language Identification: A Survey , 2019, Wirel. Pers. Commun..

[23]  Shitalkumar A. Jain,et al.  Checking integrity of data and recovery in the cloud environment , 2019 .

[24]  Aditi Sharan,et al.  Keyword and Keyphrase Extraction Techniques: A Literature Review , 2015 .

[25]  Korra Sathya Babu,et al.  Automatic Keyword Extraction for Text Summarization: A Survey , 2017, ArXiv.

[26]  Cornelia Caragea,et al.  Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach , 2014, EMNLP.

[27]  Khaled Shaalan,et al.  Keyword Identification Using Text Graphlet Patterns , 2016, NLDB.

[28]  Shibamouli Lahiri,et al.  Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks , 2014, ArXiv.

[29]  Yi-Shin Chen,et al.  RankUp: Enhancing graph-based keyphrase extraction methods with error-feedback propagation , 2018, Comput. Speech Lang..

[30]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.