The Research of Large Scale Data Processing Platform Based on the Spark

With the development of technologies of cloud computing and distributed cluster, the concept of big data was extended widely and deeply in volume and value, and data mining that plays an important role in exploring big data was attracted unprecedented attention in recent years. Traditional data mining algorithms is incapable to deal with massive dataset. MapReduce has been successfully applied in many big data problems, however, it lacks the ability to efficiently support paralyzed, iterative learning. To address the above problems, we give an integrated solution based on the Spark framework, not only process massive data efficiently, but also with a favorable scalability, which can satisfy the demand of many kinds of data mining tasks. Further we propose a framework applied in traffic field.