FastMR: fast processing for large distributed data streams

FastMR is a graph-style framework for steam-oriented applications to realize near real-time streaming data record processing, and more importantly, complex coordinations between those applications. We introduces two components --- compressed buffer trees (CBTs) and shared reducer trees (SRTs) to assist with this task. CBTs address the problem of maintaining a significant amount of application-specific "accumulator" state in memory so that streaming data processing can combine current data with historical data. They do so by employing a novel, batch-oriented approach to updating the accumulator state. SRTs are basically P2P-based reducer trees that enable fine-grained queries (both one-shot and continual) to be efficiently rolled up concurrently. CBT's intermediate results are aggregated to the root of SRT via network aggregation. The roots of SRTs are analogous to vertices and anycast/multicast message transmission between the vertices (roots of SRTs) are analogous to edges in the graph-style computation model.