DNA: An SDN framework for distributed network analytics

Analytics of network telemetry data helps address many important operational problems. Traditional Big Data approaches run into limitations even as they push scale boundaries for processing data further. One reason for this is the fact that in many cases, the bottleneck for analytics is not analytics processing itself but the generation and export of the data on which analytics depends. The amount of data that can be reasonably collected from the network runs into inherent limitations due to bandwidth and processing constraints in the network itself. In addition, management tasks related to determining and configuring which data to generate lead to significant deployment challenges. In order to address these issues, we propose a novel distributed solution to network analytics. Analytics processing is performed at the source of the data by specialized agents embedded within network devices, which also dynamically set up and reconfigure telemetry data sources as required by an analytics task. An SDN controller application orchestrates network analytics tasks across the network to allow users to interact with the network as a whole instead of individual devices one at a time. The solution has been implemented as a proof-of-concept, called DNA (Distributed Network Analytics).1

[1]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..

[2]  Alexander Clemm,et al.  Cisco Service-Level Assurance Protocol , 2013, RFC.

[3]  Jeffrey Davis,et al.  Continuous analytics over discontinuous streams , 2010, SIGMOD Conference.

[4]  Benoit Claise,et al.  Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information , 2008, RFC.

[5]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[6]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[7]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[8]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[9]  Jürgen Schönwälder,et al.  Network Configuration Protocol (NETCONF) , 2011, RFC.

[10]  Michael Stonebraker,et al.  Mariposa: a wide-area distributed database system , 1996, The VLDB Journal.

[11]  Laura M. Haas,et al.  Garlic: a new flavor of federated query processing for DB2 , 2002, SIGMOD '02.

[12]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[13]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[14]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[15]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[16]  Burkhard Stiller,et al.  Scalable and Robust Decentralized IP Traffic Flow Collection and Analysis (SCRIPT) , 2013 .

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[19]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[20]  Elisa Boschi IPFIX Implementation Guidelines , 2006 .

[21]  Martin Bjorklund,et al.  YANG - A Data Modeling Language for the Network Configuration Protocol (NETCONF) , 2010 .

[22]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[23]  Alexander Clemm,et al.  Network-Embedded Management and Applications , 2013 .

[24]  Michael J. Franklin,et al.  Continuous Analytics: Rethinking Query Processing in a Network-Effect World , 2009, CIDR.