Research on the System of Data Mining Based on Hadoop

Hadoop, is becoming a necessary part of a large-scale data mining system. Therefore, this issue is exactly a kind of practice of data mining tasks on the hadoop distributed Systems. In this paper, the main task is to build a distributed cluster computation environment using hadoop and implement a data mining task in the environment. We select data clustering task as a representative, and select the K-means clustering algorithm to do in-depth research.