A Research Using Correlation Coefficient to Make Bayesian Classification Data Mining
暂无分享,去创建一个
In traditional Bayesian classification data mining methods, there may be defects such as predictions unreliable because the selected predictors are little or not related with the target factor. this paper analyzes the correlation between predictors and the target factor using correlation coefficient based on Bayesian classification model and combines with Hadoop distributed file system and parallel programming models to explore an improved algorithm. The experiments show that this method not only makes the prediction more reliable but also saves resources and improves the efficiency of the algorithm greatly. In addition, it is suitable for massive data processing.
[1] Pedro M. Domingos,et al. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.
[2] Geoffrey I. Webb,et al. Proportional k-Interval Discretization for Naive-Bayes Classifiers , 2001, ECML.