Amazon Echo Security: Machine Learning to Classify Encrypted Traffic

As smart speakers like the Amazon Echo become more popular, they have given rise to rampant concerns regarding user privacy. This work investigates machine learning techniques to extract ostensibly private information from the TCP traffic moving between an Echo device and Amazon's servers, despite the fact that all such traffic is encrypted. Specifically, we investigate a supervised classification problem using six machine learning algorithms and three feature vectors. Our "request type classification" problem seeks to determine what type of user request is being answered by the Echo (again, even though the requests are encrypted). With six classes, we achieve 97% accuracy in this task using random forests.

[1]  Andrew W. Moore,et al.  A Machine Learning Approach for Efficient Traffic Classification , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[2]  Klaus Wehrle,et al.  Privacy in the Internet of Things: threats and challenges , 2014, Secur. Commun. Networks.

[3]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[4]  August 29-September 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[5]  Michael Langberg,et al.  Realtime Classification for Encrypted Traffic , 2010, SEA.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Mahesh Pal,et al.  Multiclass Approaches for Support Vector Machine Based Land Cover Classification , 2008, ArXiv.

[8]  William C. Barto,et al.  Classification of Encrypted Web Traffic Using Machine Learning Algorithms , 2013 .

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[11]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[12]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[13]  D. Coomans,et al.  Alternative k-nearest neighbour rules in supervised pattern recognition : Part 1. k-Nearest neighbour classification by using alternative voting rules , 1982 .

[14]  Niemczyk Identification over encrypted Channels , 2014 .

[15]  Riyad Alshammari,et al.  Machine learning based encrypted traffic classification: Identifying SSH and Skype , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[16]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.