Text classification of minimal risk with three-way decisions

Abstract In recent years, the research based on K-Nearest Neighbor(KNN) text classification has always been a hot spot. There are many kinds of improved KNN text classification. Naturally, most scholars have combined the rough sets with the text classification, but few scholars has researched the KNN text classification based on the three-way decisions. Because the distribution of a class is ambiguous, some of the articles in some categories are difficult to be categorized accurately. In order to solve the problem of unambiguous label determination, this paper proposes an algorithm about the text classification based on Three-Way Decisions with KNN(TWDKNN). The minimum risk cost model about the three-way decisions theory is used to set the threshold, and the three-way decisions are transformed into two decisions, and the membership function is redefined. Therefore, the definition of this paper narrows the search range of K-Nearest Neighbor and solves the problem of fuzzy tag judgment. The experimental results show that the classification accuracy rate, recall rate and F value are obviously improved compared with the traditional KNN text classification algorithm. Experiments show that TWDKNN has a certain improvement in the performance of text classification.

[1]  Shengyi Jiang,et al.  An improved K-nearest-neighbor algorithm for text categorization , 2012, Expert Syst. Appl..

[2]  K. Huang Applying a set of strict criteria to find the β-reducts of variable precision rough sets , 2010 .

[3]  Yiyu Yao,et al.  Sequential three-way decisions with probabilistic rough sets , 2011, IEEE 10th International Conference on Cognitive Informatics and Cognitive Computing (ICCI-CC'11).

[4]  Ye Zhao,et al.  Clustering algorithm based on fusion of ant colony algorithm and K-medoids: Clustering algorithm based on fusion of ant colony algorithm and K-medoids , 2013 .

[5]  Xindong Wu,et al.  Automatic determination about precision parameter value based on inclusion degree with variable precision rough set model , 2015, Inf. Sci..

[6]  Nan Zhang,et al.  Hierarchical rough decision theoretic framework for text classification , 2010, 9th IEEE International Conference on Cognitive Informatics (ICCI'10).

[7]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[8]  Xiaofei Deng,et al.  Multistage Email Spam Filtering Based on Three-Way Decisions , 2013, RSKT.

[9]  Songbo Tan,et al.  An effective refinement strategy for KNN text classifier , 2006, Expert Syst. Appl..

[10]  Sun Tie The Application of Multi-class SVM based Binary Tree in Web Text Categorization , 2011 .

[11]  Kuang Yu Huang Applying a set of strict criteria to find the β-reducts of variable precision rough sets , 2010 .

[12]  Bao Qing Hu,et al.  Three-way decisions space and three-way decisions , 2014, Inf. Sci..

[13]  Naomie Salim,et al.  Feature unionization: A novel approach for dimension reduction , 2017, Appl. Soft Comput..

[14]  Hyo Jong Lee,et al.  Emotion classification of EEG brain signal using SVM and KNN , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[15]  Xia Li,et al.  An improved KNN algorithm for text classification , 2010, 2010 International Conference on Information, Networking and Automation (ICINA).

[16]  Yiyu Yao,et al.  The superiority of three-way decisions in probabilistic rough set models , 2011, Inf. Sci..

[17]  JianLin Li A Text Classification Algorithm Based On RS , 2017 .

[18]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19]  Yaxin Bi,et al.  KNN Model-Based Approach in Classification , 2003, OTM.

[20]  Hao Xing Model and algorithm of document classification based on Agent-NB , 2011 .

[21]  T. Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1999, ECML.