Clustreams: Data Plane Clustering

Clusteringis a basic machine learning task. In this task, a stream of input items needs to be grouped into clusters, such that all items classified into the same cluster are closer to each other than to items classified to other clusters. Each cluster is centered around a centroidpoint, which may either be given as a parameter, or must be learned during the process in the case of unsupervised online learning. This work studies the ability to perform clustering, e.g., for classifying network traffic, in programmable switches. Conducting such classification by the switches through which the traffic flows is potentially the most efficient approach. To that end, we develop Clustreams, a novel in-network clustering system designed to handle clustering in the data path. At the core of Clustreamsis a novel clustering algorithm that relies heavily on TCAM (Ternary Content Addressable Memory) match-action capabilities. This algorithm is realized for the Nvidia Spectrum-3 switch, and is limited to classification when the centroid points are known a-priori. The work includes accuracy measurements for the algorithms, as well as run-time performance measurements and analysis of the clustering algorithm on a Spectrum-3 switch. As shown in the measurements, Clustreamsobtains very high accuracy without any noticeable run-time impact on the switch' performance.

[1]  Ramandeep Kaur,et al.  A Survey of Clustering Techniques , 2010 .

[2]  Roung-Shiunn Wu,et al.  Customer segmentation of multiple category data in e-commerce using a soft-clustering approach , 2011, Electron. Commer. Res. Appl..

[3]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[4]  Noa Zilberman,et al.  Do Switches Dream of Machine Learning?: Toward In-Network Classification , 2019, HotNets.

[5]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[6]  신기덕 2010 , 2019, The Winning Cars of the Indianapolis 500.

[7]  Fernando M. V. Ramos,et al.  Software-Defined Networking: A Comprehensive Survey , 2014, Proceedings of the IEEE.

[8]  Wolfgang Kellerer,et al.  Empowering Self-Driving Networks , 2018, SelfDN@SIGCOMM.

[9]  S. Hewitt,et al.  1979 , 1979, Salon Salon.

[10]  洋一 中西,et al.  2012: , 2012, Disasters and Social Reproduction.

[11]  S. Muthukrishnan,et al.  Heavy-Hitter Detection Entirely in the Data Plane , 2016, SOSR.

[12]  Laurent Vanbever,et al.  pForest: In-Network Inference with Random Forests , 2019, ArXiv.

[13]  Nick Feamster,et al.  Why (and How) Networks Should Run Themselves , 2017, ANRW.

[14]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[15]  Özgür Ulusoy,et al.  A Quadtree-Based Dynamic Attribute Indexing Method , 1998, Comput. J..

[16]  Sibaram Khara,et al.  Balanced Cluster Head Selection Based on Modified k-Means in a Distributed Wireless Sensor Network , 2016, Int. J. Distributed Sens. Networks.

[17]  Jerome M. Kurtzberg,et al.  A Balanced Pipelining Approach to Multiprocessing on an Instruction Stream Level , 1973, IEEE Transactions on Computers.

[18]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[19]  Vipin Kumar,et al.  Partitioning-based clustering for Web document categorization , 1999, Decis. Support Syst..

[20]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[21]  Yacov Hel-Or,et al.  Ultra-Fast Similarity Search Using Ternary Content Addressable Memory , 2015, DaMoN.

[22]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[23]  Renuka Nagpal,et al.  Crime Analysis using K-Means Clustering , 2013 .