论文信息 - Using boosting mechanism to refine the threshold of VSM-based similarity in text classification

Using boosting mechanism to refine the threshold of VSM-based similarity in text classification

The vector space model (VSM)-based similarity classifier is the simplest text categorization method. It has a high classification speed, but with low accuracy. The main reason is that the similarity threshold used by the similarity classifier is decided empirically, but not mathematically. This paper introduces a boosting-based mechanism to adaptively compute out relatively accurate similarity threshold over specific dataset. This method constructs better similarity-based classification rules by combining the similarity thresholds generated by the constituent classifiers of boosting. It greedily minimizes the error rates on training documents; therefore the similarity classifier with thus computed threshold should also have low error rates.

Lili Diao | Yuchang Lu | Chunyi Shi | Keyun Hu

[1] Richard M. Dudley,et al. Some special vapnik-chervonenkis classes , 1981, Discret. Math..

[2] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[3] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.