The maximal operator classifier

The KNN is a classic text classification algorithm. In this paper, we propose a new text classification algorithm based on the KNN. We set a text similarity threshold to optimize the value of K. In this way, we can avoid the wrong result of classification led by the unbalance of sample size. In the meantime, we use the maximal operator to calculate the text similarity instead of cosine similarity. According to the experimental data, we have made a better classification result in this way.

[1]  Kwong-Sak Leung,et al.  Nonlinear Integrals and Their Applications in Data Mining , 2010, Advances in Fuzzy Systems - Applications and Theory.

[2]  G. Klir,et al.  Fuzzy Measure Theory , 1993 .

[3]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[4]  Peng Cheng,et al.  Research on text categorization based on LDA , 2011 .

[5]  Xindong Wu,et al.  The Top Ten Algorithms in Data Mining , 2009 .

[6]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[7]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.