A Term Weighting Scheme Approach for Vietnamese Text Classification

The term weighting scheme, which is used to convert the documents to vectors in the term space, is a vital step in automatic text categorization. The previous studies showed that term weighting schemes dominate the performance. There have been extensive studies on term weighting for English text classification. However, not many works have been studied on Vietnamese text classification.. In this paper, we proposed a term weighting scheme called normalizetf.rfmax, which is based on tf.rf term weighting scheme --- one of the most effective term weighting schemes to date. We conducted experiments to compare our proposed normalizetf.rfmax term weighting scheme to tf.rf and tf.idf on Vietnamese text classification benchmark. The results showed that our proposed term weighting scheme can achieve about 3i¾?%---5i¾?% accuracy better than other term weighting schemes.