A new method for construction filed association terms using co-occurrence words and declinable words information
暂无分享,去创建一个
Readers can know the subject of many document fields by reading only some specific words called field association (FA) terms. It is very important to construct these FA terms to decide correctly the document fields from few word information in part of the file. The field can be decided efficiently if the number of these FA terms is many and the frequency rate is high. If the number of level 1 (words that directly connect to terminal fields) FA words is limited, old methods cannot determine the documents filed easily and fast, specially when there is a small number of corpus documents. This paper proposes a new method for deciding FA terms using the weight of co-occurrence words and declinable words which are related to the narrow association category with eliminating FA terms' ambiguity. Moreover, efficient FA terms are difficult to be extracted only by the information of the frequency of them. This paper proposes a new efficient method using new co-occurrence word weighting which makes precision and recall higher than the case of degree of frequency.
[1] Takenobu Tokunaga,et al. Text Categorization based on Weighted Inverse Document Frequency , 1994 .
[2] Takenobu Tokunaga,et al. Probabilistic Passage Categorization and its Application , 1999 .
[3] Gerard Salton,et al. On the Specification of Term Values in Automatic Indexing , 1973 .
[4] David A. Landgrebe,et al. A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..