Text Classification by Combining Different Distance Functions withWeights

Since data is becoming greatly large in the networks, the machine classification of the text data, is not easy under these computing circumstances. Though the k-nearest neighbor (kNN) classification is a simple and effective classification approach, the improving performance of the classifier is still attractive to cope with the high accuracy processing. In this paper, the kNN is improved by applying the different distance functions with weights to measure data from the multi-view points. Then, the weights for the optimization are computed by the genetic algorithms. After the learning of the trained data, the unknown data is classified by combining the multiple distance functions and ensemble computations of the kNN. In this paper we present a new approach to combine multiple kNN classifiers based on different distance functions, which improve the performance of the k-nearest neighbor method. The proposed combining algorithm shows the higher generalization accuracy when compared to other conventional learning algorithms

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Tony R. Martinez,et al.  An Integrated Instance‐Based Learning Algorithm , 2000, Comput. Intell..

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[5]  Naohiro Ishii,et al.  Classification by Instance-Based Learning Algorithm , 2005, IDEAL.

[6]  Stephen D. Bay Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets , 1998, ICML.

[7]  Armin Eberlein,et al.  Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing , 2009, Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[8]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[9]  Naohiro Ishii,et al.  Combining Multiple K-Nearest Neighbor Classifiers for Text Classification by Reducts , 2002, Discovery Science.

[10]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[11]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[12]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[13]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[14]  Stephen D. Bay Nearest neighbor classification from multiple feature subsets , 1999, Intell. Data Anal..

[15]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[16]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.