A Probabilistic Counting Framework for Distributed Measurements

The technological maturity attained by general purpose processors and network interface cards makes today’s commodity PCs viable and high performing alternatives to specialized hardware for deploying network devices, such as switches, routers, and generic middleboxes. In addition, the flexibility of the software solution seems to be perfectly in line with the emerging trend towards the data–plane programming abstractions brought by recent proposals such as Openflow and the P4 language. However, if programming abstractions provide the way elementary instructions (primitives) are combined together, the development of such processing primitives is left to the network programmer. Although the type of such functions is strongly domain specific, we can safely assume that the counting primitive is easily required in a great deal of practical contexts. This paper presents a counting framework based on probabilistic sketches and LogLog counters for estimating the cardinality of large multi–sets of data. The proposed data structure is designed to be fast and compact for ready use in the on–line chain of processing of network devices running at multi–gigabit speeds. The complete implementation is provided within the probabilistic data structures (pds) library which has been designed, developed, experimentally assessed, and released as open–source for free download. Although the paper specifically presents two possible use–cases, the pds library can be used in rather general scenarios, even outside the networking domain.

[1]  Mikkel Thorup,et al.  Tabulation based 4-universal hashing with applications to second moment estimation , 2004, SODA '04.

[2]  Christian Callegari,et al.  OpenCounter: Counting unknown flows in Software Defined Networks , 2015, 2015 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS).

[3]  Chen-Nee Chuah,et al.  Uncovering Global Icebergs in Distributed Streams: Results and Implications , 2011, Journal of Network and Systems Management.

[4]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[5]  Gil Einziger,et al.  Independent counter estimation buckets , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[6]  Albert G. Greenberg,et al.  Network anomography , 2005, IMC '05.

[7]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.

[8]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[9]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[10]  Elisa Boschi,et al.  Privacy-Preserving Network Monitoring: Challenges and Solutions , 2008 .

[11]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[12]  Stefano Giordano,et al.  A purely functional approach to packet processing , 2014, 2014 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[13]  Giuseppe Bianchi,et al.  OpenState: programming platform-independent stateful openflow applications inside the switch , 2014, CCRV.

[14]  Joseph Tassarotti,et al.  Efficient Training of LDA on a GPU by Mean-for-Mode Estimation , 2015, ICML.

[15]  Marina Thottan,et al.  Anomaly detection in IP networks , 2003, IEEE Trans. Signal Process..

[16]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[17]  Jake D. Brutlag,et al.  Aberrant Behavior Detection in Time Series for Network Monitoring , 2000, LISA.

[18]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[19]  Stefano Giordano,et al.  A pipeline functional language for stateful packet processing , 2017, 2017 IEEE Conference on Network Softwarization (NetSoft).

[20]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[21]  Ramana Rao Kompella,et al.  Sketching the delay: tracking temporally uncorrelated flow-level latencies , 2011, IMC '11.

[22]  Divyakant Agrawal,et al.  Fast data stream algorithms using associative memories , 2007, SIGMOD '07.

[23]  Christopher Leckie,et al.  A survey of coordinated attacks and collaborative intrusion detection , 2010, Comput. Secur..

[24]  Christian Callegari,et al.  The LogLog counting reversible sketch: A distributed architecture for detecting anomalies in backbone networks , 2012, 2012 IEEE International Conference on Communications (ICC).

[25]  Yehuda Afek,et al.  Detecting Heavy Flows in the SDN Match and Action Model , 2017, Comput. Networks.

[26]  Guy L. Steele,et al.  Adding approximate counters , 2016, PPoPP.

[27]  Yang Li,et al.  CASE: Cache-assisted stretchable estimator for high speed per-flow measurement , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[28]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[29]  Stefano Giordano,et al.  Enif-Lang: A Specialized Language for Programming Network Functions on Commodity Hardware , 2018, J. Sens. Actuator Networks.

[30]  Stefano Giordano,et al.  Network Traffic Processing With PFQ , 2016, IEEE Journal on Selected Areas in Communications.

[31]  Rade Stanojevic,et al.  Small Active Counters , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[32]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[33]  Carsten Lund,et al.  Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.

[34]  P. Flajolet,et al.  Loglog counting of large cardinalities , 2003 .

[35]  Divesh Srivastava,et al.  Holistic UDAFs at streaming speeds , 2004, SIGMOD '04.

[36]  Yan Chen,et al.  Reversible sketches for efficient and accurate change detection over network data streams , 2004, IMC '04.

[37]  Qi Zhao,et al.  Design of a novel statistics counter architecture with optimal space and time efficiency , 2006, SIGMETRICS '06/Performance '06.

[38]  Salvatore Pontarelli,et al.  Implementing advanced network functions for datacenters with stateful programmable data planes , 2017, 2017 IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN).

[39]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.

[40]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[41]  Wojciech Szpankowski,et al.  Yet another application of a binomial recurrence order statistics , 1990, Computing.

[42]  Iddo Hanniel,et al.  Estimators also need shared values to grow together , 2012, 2012 Proceedings IEEE INFOCOM.

[43]  Danny Raz,et al.  Network-wide routing-oblivious heavy hitters , 2018, ANCS.

[44]  Qiang Fu,et al.  Cardigan: deploying a distributed routing fabric , 2013, HotSDN '13.

[45]  Yu Cheng,et al.  DISCO: Memory Efficient and Accurate Flow Statistics for Network Measurement , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[46]  Cristian Estan,et al.  New directions in traffic measurement and accounting , 2001, IMW '01.

[47]  George Varghese,et al.  Network Algorithmics-An Interdisciplinary Approach to Designing Fast Networked Devices , 2004 .

[48]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[49]  Ramesh Govindan,et al.  Flow-level state transition as a new switch primitive for SDN , 2014, HotSDN.

[50]  Abhishek Kumar,et al.  Data streaming algorithms for efficient and accurate estimation of flow size distribution , 2004, SIGMETRICS '04/Performance '04.