论文信息 - Comparison of Classification Algorithms using WEKA on Various Datasets

Comparison of Classification Algorithms using WEKA on Various Datasets

Data mining is a step in the knowledge discovery process consisting of data mining algorithms that used to finds patterns or models in data. Data Mining also can be define as an analytic process designed to explore large amounts of data in search for consistent patterns and systematic relationships between variables and then to validate the findings by applying the detected patterns to new subsets of data. Classification is the most commonly applied data mining technique, which employs a set of pre-classified examples to develop a model that can classify the population of records at large. In classification techniques a model is built based on training data and applied to test data. WEKA is an open source data mining tool which includes implementation of data mining algorithms. Using WEKA we have compared the ADTree, Bayes Network, Decision Table, J48, Logistic, Naive Bayes, NBTree, PART, RBFNetwork and SMO algorithms. To compare these algorithms we have used five datasets.

B. V. Pawar | Ajay S. Patil | Bharat Deshmukh

[1] Jingmin Wang,et al. Study of the SMO Algorithm Applied in Power System Load Forecasting , 2006, PRICAI.

[2] Lutz Plümer,et al. Comparison of different classification algorithms for weed detection from images based on shape parameters , 2009 .

[3] Matthew N. Davies,et al. An experimental comparison of classification algorithms for hierarchical prediction of protein function , 2007 .

[4] I. S. P. Daryle Niedermayer,et al. An Introduction to Bayesian Networks and Their Contemporary Applications , 2008, Innovations in Bayesian Networks.

[5] Yoav Freund,et al. The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[6] Ian H. Witten,et al. Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[7] Ryan Potter. Comparison of Classification Algorithms Applied to Breast Cancer Diagnosis and Prognosis , 2007, Industrial Conference on Data Mining - Posters and Workshops.

[8] Daniel T. Larose,et al. Data mining methods and models , 2006 .

[9] Philip S. Yu,et al. Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[10] Kim Larsen,et al. Generalized Naive Bayes Classifiers , 2005, SKDD.

[11] Wotao Yin,et al. A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[12] Joaquim A. Jorge,et al. NB-Tree : An Indexing Structure for Content-Based Retrieval in Large Databases , 2003 .