Real-time data stream processing technologies play an important role in enabling time-critical decision making in many applications. This paper aims at evaluating the performance of platforms that are capable of processing streaming data. Candidate technologies include Storm, Samza, and Spark Streaming. To form the recommendation, a prototype pipeline is designed and implemented in each of the platforms using data collected from sensors used in monitoring heavy-haul railway systems. Through the testing and evaluation of each candidate platform, using both quantitative and qualitative metrics, the paper describes the findings, where Storm is found to be the most appropriate candidate.
[1]
Anthony J. G. Hey,et al.
The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View]
,
2011
.
[2]
Michael Stonebraker,et al.
The 8 requirements of real-time stream processing
,
2005,
SGMD.
[3]
Scott Shenker,et al.
Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters
,
2012,
HotCloud.
[4]
Jignesh M. Patel,et al.
Storm@twitter
,
2014,
SIGMOD Conference.
[5]
Scott Shenker,et al.
Spark: Cluster Computing with Working Sets
,
2010,
HotCloud.