Demographics inference through Wi-Fi network traffic analysis

Although privacy leaking through content analysis of Wi-Fi traffic has received an increased attention, privacy inference through meta-data (e.g. IP, Host) analysis of Wi-Fi traffic represents a potentially more serious threat to user privacy. Firstly, it represents a more efficient and scalable approach to infer users' sensitive information without checking the content of Wi-Fi traffic. Secondly, meta-data based demographics inference can work on both unencrypted and encrypted traffic (e.g., HTTPS traffic). In this study, we present a novel approach to infer user demographic information by exploiting the meta-data of Wi-Fi traffic. We develop a proof-of-concept prototype, Demographic Information Predictor (DIP) system, and evaluate its performance on a real-world dataset, which includes the Wi-Fi access of 28,158 users in 5 months. DIP extracts four kinds of features from real-world Wi-Fi traffic and proposes a novel machine learning based inference technique to predict user demographics. Our analytical results show that, for unencrypted traffic, DIP can predict gender and education level of users with an accuracy of 78% and 74% respectively. It is surprising to show that, even for HTTPS traffic, user demographics can still be predicted at a precision of 67% and 72% respectively, which well demonstrates the practicality of the proposed privacy inference scheme.

[1]  Andrzej Duda,et al.  Markov chain fingerprinting to classify encrypted traffic , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[2]  Aleksandar Kuzmanovic,et al.  Mosaic: quantifying privacy leakage in mobile networks , 2013, SIGCOMM.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Wei Cheng,et al.  Characterizing privacy leakage of public WiFi networks for users on travel , 2013, 2013 Proceedings IEEE INFOCOM.

[5]  Erik Tews,et al.  Practical attacks against WEP and WPA , 2009, WiSec '09.

[6]  Parth H. Pathak,et al.  Contextual localization through network traffic analysis , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[7]  Mark Handley,et al.  The final nail in WEP's coffin , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[8]  Haojin Zhu,et al.  All your location are belong to us: breaking mobile social networks for automated user location tracking , 2013, MobiHoc '14.

[9]  Samir I. Shaheen,et al.  A New Security Mechanism to Perform Traffic Anonymity with Dummy Traffic Synthesis , 2009, 2009 International Conference on Computational Science and Engineering.

[10]  Mohamed Ali Kâafar,et al.  You are what you like! Information leakage through users' Interests , 2012, NDSS.

[11]  Milad Shokouhi,et al.  Inferring the demographics of search users: social data meets search queries , 2013, WWW.

[12]  Prasant Mohapatra,et al.  Your Installed Apps Reveal Your Gender and More! , 2015, MOCO.

[13]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[14]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.

[15]  Hua Li,et al.  Demographic prediction based on user's browsing behavior , 2007, WWW '07.

[16]  Shuai Li,et al.  Location privacy preservation in collaborative spectrum sensing , 2012, 2012 Proceedings IEEE INFOCOM.

[17]  Hayder Radha,et al.  Who are you talking to? Breaching privacy in encrypted IM networks , 2013, 2013 21st IEEE International Conference on Network Protocols (ICNP).

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Michael Weber,et al.  Device Names in the Wild: Investigating Privacy Risks of Zero Configuration Networking , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[20]  Xuemin Shen,et al.  An Efficient Privacy-Preserving Scheme against Traffic Analysis Attacks in Network Coding , 2009, IEEE INFOCOM 2009.