Survey of Classification Techniques in Data Mining

Classification is a data mining (machine learning) technique used to predict group membership for data instances. In this paper, we present the basic classification techniques. Several major kinds of classification method including decision tree induction, Bayesian networks, k-nearest neighbor classifier, case-based reasoning, genetic algorithm and fuzzy logic techniques. The goal of this survey is to provide a comprehensive review of different classification techniques in data mining. Data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data set. These tools can include statistical models, mathematical algorithm and machine learning methods. Consequently, data mining consists of more than collection and managing data, it also includes analysis and prediction. Classification technique is capable of processing a wider variety of data than regression and is growing in popularity. There are several applications for Machine Learning (ML), the most significant of which is data mining. People are often prone to making mistakes during analyses or, possibly, when trying to establish relationships between multiple features. This makes it difficult for them to find solutions to certain problems. Machine learning can often be successfully applied to these problems, improving the efficiency of systems and the designs of machines.

[1]  Remco R. Bouckaert Naive Bayes Classifiers That Perform Well with Continuous Variables , 2004, Australian Conference on Artificial Intelligence.

[2]  David W. Aha,et al.  Simplifying decision trees: A survey , 1997, The Knowledge Engineering Review.

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  Miroslav Kubat,et al.  A reduction technique for nearest-neighbor classification: Small groups of examples , 2001, Intell. Data Anal..

[5]  Zijian Zheng,et al.  Constructing X-of-N Attributes for Decision Tree Learning , 2000, Machine Learning.

[6]  David McSherry,et al.  Strategic induction of decision trees , 1999, Knowl. Based Syst..

[7]  Tapio Elomaa The Biases of Decision Tree Pruning Strategies , 1999, IDA.

[8]  Russell Greiner,et al.  Learning Bayesian Belief Network Classifiers: Algorithms and System , 2001, Canadian Conference on AI.

[9]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[10]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[11]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[12]  Robert G. Cowell,et al.  Conditions Under Which Conditional Independence and Scoring Methods Lead to Identical Selection of Bayesian Network Models , 2001, UAI.

[13]  Geoffrey I. Webb,et al.  On Why Discretization Works for Naive-Bayes Classifiers , 2003, Australian Conference on Artificial Intelligence.

[14]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[15]  Michael G. Madden,et al.  The Performance of Bayesian Network Classifiers Constructed using Different Techniques , 2003 .

[16]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[17]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[18]  Sung Wook Baik,et al.  A Decision Tree Algorithm for Distributed Data Mining: Towards Network Intrusion Detection , 2004, ICCSA.

[19]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[20]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[21]  Christopher K. I. Williams,et al.  Comparing Bayesian neural network algorithms for classifying segmented outdoor images , 2001, Neural Networks.

[22]  Khaled Benkrid,et al.  Proceedings of the International MultiConference of Engineers and Computer Scientists, Vol II, IMECS 2009 , 2009 .

[23]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .