A new weighting algorithm for linear classifier

In the domain of text categorization (TC), the TF (term frequency)* IDF (inverse document frequency) weighting algorithm and TF*IWF*IWF weighting algorithm are widely used. However, the two algorithms are too biased by the term frequency and neglect the imbalance between classes. In this paper, we propose a new weighting algorithm, which is named as TF (term frequency)*IWF (inverse word frequency)*IWF (inverse word frequency)*VE (variance and expectation). The new algorithm improves the TF*IWF*IWF weighting algorithm in both TF and VE. This paper compares the new algorithm with TF*IWF*IWF algorithm respectively in theory and experiment. From the preliminary experiment, we find that the F1-measure has been improved for 11.78%.