Research on the Application for kNN Algorithm at SMS Client Classification

This paper studied and realized the SMS client classification system based on kNN algorithm and extracted two feature vectors set of the normal and spam SMS from the self-built SMS corpus,and made the feature vectors set get the feature item of the SMS to the maximum extent through the pretreatment,reducing dimension and removing the smaller frequency feature items. The study showed that the classification effect was the best when n was took 600,the SMS recognition rate reduced when n was too small,the classification time complexity enhanced when n too large,the optimum was neighbor number k to be took 25. At the meantime,the optimum effect was performed when the probability discrepancy of k SMS between 1% and 2%,and number discrepancy of which between 5 and 15. The recognition rate of normal and spam SMS was up to 97.3% and 89% when the final classification system parameter n was took 600,k was took25,probability difference 1.5%,discrepancy number was took 9 to ensure the better normal SMS pass rate and spam SMS recognition rate.