A Novel Term Weighting Scheme for Automated Text Categorization

Term weighting is an important task for text classification. Inverse document frequency (IDF) is one of the most popular methods for this task; however, in some situations, such as supervised learning for text categorization, it doesn 't weight terms properly, because it neglects the category information and assumes that a term that occurs in smaller set of documents should get a higher weight. There have been several term weighting schemes that consider the category information. In this paper, we present a new term weighting scheme that considers more information provided by the term distribution among different categories. The experiments show that our method is more effective than three other popular schemes.