Una Generalización del Clasificador Naive Bayes para Usarse en Bases de Datos con Dependencia de Variables

Resumen. A pesar de la suposicion que hace sobre la independencia de variables, el Clasificador Naive Bayes es muy utilizado en Mineria de Datos y Aprendizaje Automatico debido principalmente a su relativa simpleza y robustez mostrados frente a gran cantidad de problemas. Al suponer una independencia de variables, el modelo de NB proporciona un modelo no representativo cuando la base de datos tiene variables dependientes. Ante esta situacion, se han propuesto varias aproximaciones que mejoran el desempeno del NB pero requieren mayores recursos y resultan complicados de implementar. Aqui se propone una nueva aproximacion que puede ser usada cuando exista dependencia de variables conservando una sencillez de implementacion. Tambien se propone una metrica para determinar a priori si utilizar la aproximacion mas simple del clasificador NB o no. Los resultados obtenidos en cuatro bases de UCI mostraron que el modelo propuesto mejora el desempeno del NB cuando existe dependencia de variables.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  HallMark A decision tree-based attribute weighting filter for naive Bayes , 2007 .

[3]  Musa Mammadov,et al.  Attribute weighted Naive Bayes classifier using a local optimization , 2014, Neural Computing and Applications.

[4]  Thomas S. Huang,et al.  Weighted Bayesian Network for Visual Tracking , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[5]  Xuesong Yan,et al.  Survey of Improving Naive Bayes for Classification , 2007, ADMA.

[6]  Harry Zhang,et al.  Learning weighted naive Bayes with accurate ranking , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[7]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[8]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[9]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[10]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[11]  Liangxiao Jiang,et al.  Weightily Averaged One-Dependence Estimators , 2006, PRICAI.

[12]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[13]  Zhihua Cai,et al.  Attribute Weighting via Differential Evolution Algorithm for Attribute Weighted Naive Bayes (WNB) , 2011 .

[14]  P. Marquet,et al.  Comparing the relative contributions of biotic and abiotic factors as mediators of species’ distributions , 2013 .

[15]  Christopher R. Stephens,et al.  Predicting healthcare costs using GAs , 2005, GECCO '05.

[16]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[17]  Seral Özsen,et al.  Attribute weighting via genetic algorithms for attribute weighted artificial immune system (AWAIS) and its application to heart disease and liver disorders problems , 2009, Expert Syst. Appl..

[18]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[19]  Mark A. Hall,et al.  A decision tree-based attribute weighting filter for naive Bayes , 2006, Knowl. Based Syst..

[20]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[21]  Eamonn J. Keogh,et al.  Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[22]  M. Pazzani Constructive Induction of Cartesian Product Attributes , 1998 .

[23]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Liangxiao Jiang,et al.  A Novel Bayes Model: Hidden Naive Bayes , 2009, IEEE Transactions on Knowledge and Data Engineering.

[25]  María S. Pérez-Hernández,et al.  Learning Semi Naïve Bayes Structures by Estimation of Distribution Algorithms , 2003, EPIA.

[26]  Adil M. Bagirov,et al.  Improving Naive Bayes Classifier Using Conditional Probabilities , 2011, AusDM.

[27]  R. Grover The Handbook of Marketing Research: Uses, Misuses, and Future Advances , 2006 .