Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework

Stream processing pipelines operated by current big data streaming frameworks present two problems. First, the pipelines are not flexible, controllable, and programmable enough to accommodate dynamic streaming application needs. Second, the application-level data routing over the pipelines do not exhibit optimal performance for increasingly common one-to-many communication. To address these problems, we propose an SDN-based real-time big data streaming framework called Typhoon, that tightly integrates SDN functionality into a real-time stream framework. By partially offloading application-layer data routing and control to the network layer via SDN interfaces and protocols, Typhoon provides on-the-fly programmability of both the application and network layers, and achieve high-performance data routing. In addition, Typhoon SDN controller exposes cross-layer information, from both the application and the network, to SDN control plane applications to extend the framework's functionality. We introduce several SDN control plane applications to illustrate these benefits.

[1]  Oliver Michel,et al.  Extending the software-defined network boundary , 2014, SIGCOMM.

[2]  Gregory J. Pottie,et al.  Instrumenting the world with wireless sensor networks , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Gianmarco De Francisci Morales,et al.  The power of both choices: Practical load balancing for distributed stream processing engines , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[4]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[5]  Zhengping Qian,et al.  TimeStream: reliable stream computation in the cloud , 2013, EuroSys '13.

[6]  Ion Stoica,et al.  Occupy the cloud: distributed computing for the 99% , 2017, SoCC.

[7]  Geoffrey C. Fox,et al.  Towards High Performance Processing of Streaming Data in Large Data Centers , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[8]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[9]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[10]  Eyal de Lara,et al.  Accelerating Complex Data Transfer for Cluster Computing , 2016, HotCloud.

[11]  Guru M. Parulkar,et al.  OpenVirteX: make your virtual SDNs programmable , 2014, HotSDN.

[12]  Ying Xing,et al.  Dynamic load distribution in the Borealis stream processor , 2005, 21st International Conference on Data Engineering (ICDE'05).

[13]  Rob Sherwood,et al.  FlowVisor: A Network Virtualization Layer , 2009 .

[14]  Richard J. Moore A Universal Dynamic Trace for Linux and Other Operating Systems , 2001, USENIX Annual Technical Conference, FREENIX Track.

[15]  Panganamala Ramana Kumar,et al.  A cautionary perspective on cross-layer design , 2005, IEEE Wireless Communications.

[16]  Muhammad Anis Uddin Nasir Fault Tolerance for Stream Processing Engines , 2016, ArXiv.

[17]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[18]  Zdravko Bozakov,et al.  AutoSlice: automated and scalable slicing for software-defined networks , 2012, CoNEXT Student '12.

[19]  Zhuo Liu,et al.  Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[20]  Andrea C. Arpaci-Dusseau,et al.  Serverless Computation with OpenLambda , 2016, HotCloud.

[21]  Chen Liang,et al.  Participatory networking: an API for application control of SDNs , 2013, SIGCOMM.

[22]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[23]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[24]  M. Slee,et al.  Thrift : Scalable Cross-Language Services Implementation , 2022 .

[25]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[26]  Sriram Rao,et al.  Dhalion: Self-Regulating Stream Processing in Heron , 2017, Proc. VLDB Endow..

[27]  Paramvir Bahl,et al.  Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.

[28]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[29]  Mark S. Squillante,et al.  Failure data analysis of a large-scale heterogeneous server environment , 2004, International Conference on Dependable Systems and Networks, 2004.

[30]  Srinivasan Parthasarathy,et al.  Facilitating interactive distributed data stream processing and mining , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[31]  Antonio Pescapè,et al.  On the Integration of Cloud Computing and Internet of Things , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[32]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[33]  Thomas Weise,et al.  Apache Apex , 2019, Encyclopedia of Big Data Technologies.

[34]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[35]  Larry Rudolph,et al.  How to Do a Million Watchpoints: Efficient Debugging Using Dynamic Instrumentation , 2008, CC.