论文信息 - A Method on Chinese Thesauri

A Method on Chinese Thesauri

In recent years, text analysis has become increasingly heated in many fields. And now, majority methods of text analysis are using Word2vec, Naive Bayes or so on to classify the large number of texts. But for the text itself, not all samples are useful for some high-requirement researches and only use one keywords to get the related sample is definitely not enough. In this paper, we provide a novel model of second text filtering with Chinese Thesauri. It includes roughly 5 steps: sample collecting, thesauri establishment, word-segment algorithm, word-frequency statistics and the calculation of text relevance. Its main purpose is making the sample texts more accurate with the keywords which are input by the user and avoiding the needless time and space waste.

[1] Carl W. Roberts,et al. Text analysis for the social sciences : methods for drawing statistical inferences from texts and transcripts , 1997 .

[2] Beth Sundheim,et al. A Performance Evaluation of Text-Analysis Technologies , 1991, AI Mag..

[3] P. J. Stone. Thematic text analysis: new agendas for analyzing text content , 1997 .

[4] Carlo Strapparava,et al. Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[5] Graeme Hirst,et al. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[6] Mary Lacity,et al. Understanding Qualitative Data: A Framework of Text Analysis Methods , 1994, J. Manag. Inf. Syst..

[7] Dagobert Soergel,et al. Indexing languages and thesauri : construction and maintenance , 1974 .

[8] Martha W. Evens,et al. Relational thesauri in information retrieval , 1985, J. Am. Soc. Inf. Sci..

[9] Stan Matwin,et al. Text Classification Using WordNet Hypernyms , 1998, WordNet@ACL/COLING.

[10] J. Pennebaker,et al. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[11] Ronald R. Yager,et al. The use of fuzzy relational thesauri for classificatory problem solving in information retrieval and expert systems , 1993, IEEE Trans. Syst. Man Cybern..