On Runtime and Classification Performance of the Discretize-Optimize (DISCO) Classification Approach

Using machine learning in high-speed networks for tasks such as flow classification typically requires either very resource efficient classification approaches, large amounts of computational resources, or specialized hardware. Here we provide a sketch of the discretize-optimize (DISCO) approach which can construct an extremely efficient classifier for low dimensional problems by combining feature selection, efficient discretization, novel bin placement, and lookup. As feature selection and discretization parameters are crucial, appropriate combinatorial optimization is an important aspect of the approach. A performance evaluation is performed for a YouTube classification task using a cellular traffic data set. The initial evaluation results show that the DISCO approach can move the Pareto boundary in the classification performance versus runtime trade-off by up to an order of magnitude compared to runtime optimized random forest and decision tree classifiers.

[1]  Johan Garcia,et al.  Towards Video Flow Classification at a Million Encrypted Flows Per Second , 2018, 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA).

[2]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[3]  Pavel Celeda,et al.  A survey of methods for encrypted traffic classification and analysis , 2015, Int. J. Netw. Manag..

[4]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[5]  Raouf Boutaba,et al.  A comprehensive survey on machine learning for networking: evolution, applications and research opportunities , 2018, Journal of Internet Services and Applications.

[6]  Sandrine Vaton,et al.  High‐speed flow‐based classification on FPGA , 2014, Int. J. Netw. Manag..

[7]  Francisco Herrera,et al.  A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[8]  Johan Garcia,et al.  Efficient Distribution-Derived Features for High-Speed Encrypted Flow Classification , 2018, NetAI@SIGCOMM.