An Iterative Hadoop-Based Ensemble Data Classification Model on Distributed Medical Databases

As the size and complexity of the online biomedical databases are growing day by day, finding an essential structure or unstructured patterns in the distributed biomedical applications has become more complex. Traditional Hadoop-based distributed decision tree models such as Probability based decision tree (PDT), Classification And Regression Tree (CART) and Multiclass Classification Decision Tree have failed to discover relational patterns, user-specific patterns and feature-based patterns, due to the large number of feature sets. These models depend on selection of relevant attributes and uniform data distribution. Data imbalance, indexing and sparsity are the three major issues in these distributed decision tree models. In this proposed model, an enhanced attributes selection ranking model and Hadoop-based decision tree model were implemented to extract the user-specific interesting patterns in online biomedical databases. Experimental results show that the proposed model has high true positive, high precision and low error rate compared to traditional distributed decision tree models.