Context-aware Website Fingerprinting over Encrypted Proxies

Website fingerprinting (WFP) could infer which websites a user is accessing via an encrypted proxy by passively inspecting the traffic between the user and the proxy. The key to WFP is designing a classifier capable of distinguishing traffic characteristics of accessing different websites. However, when deployed in real-life networks, a well-trained classifier may face a significant obstacle of training-testing asymmetry, which fundamentally limits its practicability. Specifically, although pure traffic samples can be collected in a controlled (clean) testbed for training, the classifier may fail to extract such pure traffic samples as its input from raw complicated traffic for testing. In this paper, we are interested in encrypted proxies that relay connections between the user and the proxy individually (e.g., Shadowsocks), and design a context-aware system using built-in spatial-temporal flow correlation to address the obstacle. Extensive experiments demonstrate that our system does not only enable WFP against a popular type of encrypted proxies practical, but also achieves better performance than ideally training/testing pure samples.