Keyword Extraction From Chinese Text Based On Multidimensional Weighted Features

This paper proposed to solve the problems of incomplete coverage and low accuracy in keyword extraction of Chinese text based on intrinsic feature of the Chinese language and an extraction method of multidimensional information weighted eigenvalues. This method combined theoretical analysis and experimental calculation to study the parts of speech, word position, word length, semantic similarity and word co-occurrence frequency in Chinese texts. By combining multidimensional data related to word frequency, word feature values, word similarity and word co-occurrence probability, we calculated that the weighted eigenvalues obtained by comparing precision rate, recall rate and F measure and concluded that the proposed method can give a better measure of the word accuracy than using word frequency or the basic eigenvalue methods alone. The conclusions obtained in this study provide reference values for keyword extraction and text mining. Subject Categories and Descriptors I.2.7 [Artificial intelligence]: Natural Language Processing Text analysis; H.2.8 [Database Applications]: Data mining; General Terms: Chinese text mining, sentiment analysis

[1]  Malladi Ravisankar,et al.  Effective Pattern Discovery for Text Mining , 2018 .

[2]  Ding Qiu-lin Chinese Keyword Extraction Algorithm Based on Synonym Chains , 2010 .

[3]  Zhang Hua-xiang Classification algorithm based on semantics and text feature weighting , 2012 .

[4]  Gu Xiaofeng Study on HowNet-Based Word Similarity Algorithm , 2010 .

[5]  Samir Elloumi,et al.  Formal context coverage based on isolated labels: An efficient solution for text feature extraction , 2012, Inf. Sci..

[6]  Tao Xiao-peng Semantic Similarity Computing Method Based on Wikipedia , 2011 .

[7]  Ying Shao,et al.  Application of vector similarity method in multi-plan optimization , 2015, 2015 IEEE International Conference on Information and Automation.

[8]  John A Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD , 2012, Behavior Research Methods.

[9]  Patricio Cortes,et al.  Predictive Control of Power Converters and Electrical Drives: Rodriguez/Predictive Control of Power Converters and Electrical Drives , 2012 .

[10]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[11]  M. Dolores del Castillo,et al.  SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..

[12]  Huai Xiao-yong Semantic-based Keyword Extraction Algorithm for Chinese Text , 2012 .

[13]  Zhang Dexian Improved Feature Weight Algorithm , 2011 .

[14]  Shou Zhao-yu Improved Chinese word segmentation method based on dictionary , 2013 .

[15]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[16]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[17]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.