A Research Using Correlation Coefficient to Make Bayesian Classification Data Mining

In traditional Bayesian classification data mining methods, there may be defects such as predictions unreliable because the selected predictors are little or not related with the target factor. this paper analyzes the correlation between predictors and the target factor using correlation coefficient based on Bayesian classification model and combines with Hadoop distributed file system and parallel programming models to explore an improved algorithm. The experiments show that this method not only makes the prediction more reliable but also saves resources and improves the efficiency of the algorithm greatly. In addition, it is suitable for massive data processing.