Data parallel processing system based on Cassandra

The invention discloses a data parallel processing system based on Cassandra. The data parallel processing system based on the Cassandra comprises a Hadoop main node, a plurality of Hadoop auxiliary nodes and a Cassandra storage end arranged on the Hadoop auxiliary node, wherein the main node comprises a user interface module, a Cassandra inquiring module, a job scheduling module, a job queue module, and a job tracker, wherein the auxiliary node comprises a task tracker, an input module, an output module and a Mapreduce module, the user interface module is used for receiving a user request, and judging that the type of the user request is a data inquiring request, or a submitting data processing job request, or a job information inquiring request, if the type of the user request is the data inquiring request, the user interface module sends the data inquiring request to the Cassandra inquiring module, if the type of the user request is the submitting data processing job request or the job information inquiring request, and the user interface module sends the submitting data processing job request or the job information inquiring request to the job scheduling module. The data parallel processing system based on the Cassandra has the advantages of being high in reliability, good in expansibility, and high in a throughput rate. The data parallel processing system based on the Cassandra has the capacity of simply inquiring and rapidly responding to the data, and meanwhile has the complex processing capacity to mass data.

[1]  Herodotos Herodotou,et al.  Massively Parallel Databases and MapReduce Systems , 2013, Found. Trends Databases.