K-nearest neighbor and C4.5 algorithms as data mining methods: advantages and difficulties

Summary form only given. Data mining is considered a fast growing technology as a result of the combination of some existing technologies such as machine learning, database systems, statistics and visualization. Some data mining algorithms has been used to offer a solution to classification problems in databases. To explain this task, comparison between the k-nearest neighbor (K-NN) and C4.5 algorithms in terms of their performance as classifier is carried out. While the K-NN is a supervised learning algorithm, C4.5 is an inductive learning algorithm. It is shown that the K-NN algorithm has the options for weight setting, normalization, editing the data and it can be used to develop hybrid systems for data mining. It is also shown the C4.5 algorithm can generate rules from a single tree with the ability to transform multiple decision trees into a set of classification rules and it can be used to better scale up rule generation in terms of size and number of rules and learning time.

[1]  David W. Aha,et al.  Weighting Features , 1995, ICCBR.

[2]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  John G. Hughes,et al.  Hybrid Data Mining Systems: The Next Generation , 1998, PAKDD.

[6]  Jonathan J. Hull,et al.  Syntactic pattern classification by branch and bound search , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[8]  Theodosios Pavlidis,et al.  A Shape Analysis Model with Applications to a Character Recognition System , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Sukhdev Khebbal,et al.  Intelligent Hybrid Systems , 1994 .

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[12]  Francesc J. Ferri,et al.  Small sample size effects in the use of editing techniques , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[13]  A. Joussellin,et al.  A link between k-nearest neighbor rules and knowledge based systems by sequence analysis , 1987, Pattern Recognit. Lett..

[14]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[15]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[16]  N. DeClaris,et al.  An unsupervised neural network approach to medical data mining techniques , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[17]  Jeffrey Scott Vitter,et al.  Scalable mining for classification rules in relational databases , 1998 .