Basic techniques in text mining using open-source tools
暂无分享,去创建一个
There are many text mining tools provided commercially and non-commercially. However, the elementary text-based analysis can be done with basic Unix commands, shell-scripts, and small program of scripting languages, instead of using such extensive software. This paper introduces the basic techniques for text mining, using combination of a set of standard commands, small code, and generic tools provided as the open-source software. The target of the analysis are sixty-seven articles written by one author in a relay column since 1998. Several text-based analyses reveals a trend of interest moved within about fifteen years. In addition, at the end of this paper, the results of text-based analysis are compared with that of non-text-based analysis and the efficiency of non-parametric analysis is discussed.
[1] Paul M. B. Vitányi,et al. Clustering by compression , 2003, IEEE Transactions on Information Theory.
[2] Gurpreet Singh Lehal,et al. A Survey of Text Mining Techniques and Applications , 2009 .