Reliable Machine Learning for Networking: Key Issues and Approaches

Machine learning has become one of the go-to methods for solving problems in the field of networking. This development is driven by data availability in large-scale networks and the commodification of machine learning frameworks. While this makes it easier for researchers to implement and deploy machine learning solutions on networks quickly, there are a number of vital factors to account for when using machine learning as an approach to a problem in networking and translate testing performance to real networks deployments successfully. This paper, rather than presenting a particular technical result, discusses the necessary considerations to obtain good results when using machine learning to analyze network-related data.

[1]  櫻井 幸一,et al.  IEEE Symposium on Security and Privacy 2014 参加報告 , 2012 .

[2]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[3]  Devavrat Shah,et al.  Efficient crowdsourcing for multi-class labeling , 2013, SIGMETRICS '13.

[4]  Radu State,et al.  Is big data sufficient for a reliable detection of non-technical losses? , 2017, 2017 19th International Conference on Intelligent System Application to Power Systems (ISAP).

[5]  Parvez Ahammad,et al.  SoK: Applying Machine Learning in Security - A Survey , 2016, ArXiv.

[6]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[7]  F. Roli,et al.  2 Machine Learning Methods for Computer Security 1 Executive Summary , 2013 .

[8]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[9]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[10]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[11]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[12]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[13]  Neil D. Lawrence,et al.  When Training and Test Sets Are Different: Characterizing Learning Transfer , 2009 .

[14]  Bikash Koley The Zero Touch Network , 2016 .

[15]  Amos Storkey,et al.  When Training and Test Sets are Different: Characterising Learning Transfer , 2013 .

[16]  Dawn Xiaodong Song,et al.  Inference and analysis of formal models of botnet command and control protocols , 2010, CCS '10.

[17]  Radu State,et al.  On non-parametric models for detecting outages in the mobile network , 2017, 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[18]  Diane J. Cook,et al.  Real-Time Learning when Concepts Shift , 2000, FLAIRS.

[19]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[20]  Nicola Bui,et al.  A Survey of Anticipatory Mobile Networking: Context-Based Classification, Prediction Methodologies, and Optimization Techniques , 2016, IEEE Communications Surveys & Tutorials.

[21]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[22]  Blaine Nelson,et al.  Adversarial machine learning , 2019, AISec '11.

[23]  Herbert Bos,et al.  Prudent Practices for Designing Malware Experiments: Status Quo and Outlook , 2012, 2012 IEEE Symposium on Security and Privacy.

[24]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[25]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[26]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.