Stateful Scalable Stream Processing at LinkedIn

Distributed stream processing systems need to support stateful processing, recover quickly from failures to resume such processing, and reprocess an entire data stream quickly. We present Apache Samza, a distributed system for stateful and fault-tolerant stream processing. Samza utilizes a partitioned local state along with a low-overhead background changelog mechanism, allowing it to scale to massive state sizes (hundreds of TB) per application. Recovery from failures is sped up by re-scheduling based on Host Affinity. In addition to processing infinite streams of events, Samza supports processing a finite dataset as a stream, from either a streaming source (e.g., Kafka), a database snapshot (e.g., Databus), or a file system (e.g. HDFS), without having to change the application code (unlike the popular Lambda-based architectures which necessitate maintenance of separate code bases for batch and stream path processing). Samza is currently in use at LinkedIn by hundreds of production applications with more than 10, 000 containers. Samza is an open-source Apache project adopted by many top-tier companies (e.g., LinkedIn, Uber, Netflix, TripAdvisor, etc.). Our experiments show that Samza: a) handles state efficiently, improving latency and throughput by more than 100X compared to using a remote storage; b) provides recovery time independent of state size; c) scales performance linearly with number of containers; and d) supports reprocessing of the data stream quickly and with minimal interference on real-time traffic.

[1]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[2]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[3]  Jun Rao,et al.  Liquid: Unifying Nearline and Offline Big Data Integration , 2015, CIDR.

[4]  Indranil Gupta,et al.  Ambry: LinkedIn's Scalable Geo-Distributed Object Store , 2016, SIGMOD Conference.

[5]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[6]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[7]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[8]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[9]  Luke M. Leslie,et al.  Zorro: zero-cost reactive failure recovery in distributed graph processing , 2015, SoCC.

[10]  Lei Gao,et al.  Data Infrastructure at LinkedIn , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[11]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[12]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[13]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.

[14]  Raul Castro Fernandez,et al.  Making State Explicit for Imperative Big Data Processing , 2014, USENIX Annual Technical Conference.

[15]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[16]  Elke A. Rundensteiner,et al.  Run-time operator state spilling for memory intensive long-running queries , 2006, SIGMOD Conference.

[17]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[18]  Kostas Magoutis,et al.  CEC: Continuous eventual checkpointing for data stream processing operators , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[19]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[20]  Tilmann Rabl,et al.  Solving Big Data Challenges for Enterprise Application Performance Management , 2012, Proc. VLDB Endow..

[21]  David Zhang,et al.  On brewing fresh espresso: LinkedIn's distributed data serving platform , 2013, SIGMOD '13.

[22]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[23]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[24]  Pieter Hintjens,et al.  ZeroMQ: Messaging for Many Applications , 2013 .

[25]  Kirsten Hildrum,et al.  Distributed middleware reliability and fault tolerance support in system S , 2011, DEBS '11.

[26]  Michael Stonebraker,et al.  High-availability algorithms for distributed stream processing , 2005, 21st International Conference on Data Engineering (ICDE'05).

[27]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[28]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[29]  Wei Lin,et al.  StreamScope: Continuous Reliable Distributed Processing of Big Data Streams , 2016, NSDI.

[30]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[31]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[32]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[33]  Michael Stonebraker,et al.  S-Store: Streaming Meets Transaction Processing , 2015, Proc. VLDB Endow..

[34]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[35]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[36]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[37]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[38]  Fan Ye,et al.  An empirical study of high availability in stream processing systems , 2009, Middleware.

[39]  Indranil Gupta,et al.  Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On-demand , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[40]  Kun-Lung Wu,et al.  Consistent Regions: Guaranteed Tuple Processing in IBM Streams , 2016, Proc. VLDB Endow..