DeviceMien: network device behavior modeling for identifying unknown IoT devices

With the explosion of IoT device use, networks are becoming more vulnerable to attack. Network administrators need better tools to verify and discover these devices in order to minimize attack risk. Existing tools provide rule-based assessment capabilities that cannot keep pace with the proliferation of devices. Current techniques demonstrate that given a rich set of labeled packet traces, one could design a pipeline that identifies all the devices in that trace with over 99% accuracy [30, 32]. However, it has also been observed [25], that such techniques are brittle when no labels are available. More perniciously, they provide false confidence scores about the label they do ascribe to a sample. This paper introduces a probabilistic framework for providing meaningful feedback in device identification, particularly when the device has not been previously observed. In our work, we use stacked autoencoders for automatically learning features from device traffic, learn the classes of traffic observed, and probabilistically model each device as a distribution of traffic classes. Our experiments show that we are able to identify previously seen devices after only 18.9 TCP-flow samples with 100% accuracy for devices where at least 50 samples are observed. We also show that we can distinguish between two broad classes of devices - IoT and Non-IoT - by examining the average number of flow classes observed over a set of samples. Our experiments show that we can infer the correct class of unseen devices with an over 82% average F1 score and 70% accuracy.

[1]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[2]  Vijay Sivaraman,et al.  Characterizing and classifying IoT traffic in smart cities and campuses , 2017, 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[3]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[4]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[5]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[6]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[7]  Jaime Lloret,et al.  Network Traffic Classifier With Convolutional and Recurrent Neural Networks for Internet of Things , 2017, IEEE Access.

[8]  Srinivasan Seshan,et al.  Handling a trillion (unfixable) flaws on a billion devices: Rethinking network security for the Internet-of-Things , 2015, HotNets.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[11]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[12]  Vijay Sivaraman,et al.  Classifying IoT Devices in Smart Environments Using Network Traffic Characteristics , 2019, IEEE Transactions on Mobile Computing.

[13]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[14]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[15]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[16]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[17]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[18]  Nick Feamster,et al.  A Smart Home is No Castle: Privacy Vulnerabilities of Encrypted IoT Traffic , 2017, ArXiv.

[19]  Vijay Sivaraman,et al.  Low-cost flow-based security solutions for smart-home IoT devices , 2016, International Workshop on Ant Colony Optimization and Swarm Intelligence.

[20]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[21]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[22]  Sasu Tarkoma,et al.  IoT Sentinel Demo: Automated Device-Type Identification for Security Enforcement in IoT , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[23]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[24]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.