Trace Oddity

Methodologies for Data-Driven Traffic Analysis on Tor

by Vera Rimmer, Theodor Schnitzler, Tom Van Goethem, Abel Rodríguez Romero, Wouter Joosen, and Katharina Kohls

On this page, you will find information about our study published at The 22nd Privacy Enhancing Technologies Symposium on end-to-end traffic correlation attacks against Tor. In our research, we design and evaluate a multi-proxy data collection setup to simulate the attack more realistically. Here you can find our paper, the code and the dataset.

Paper Abstract

Traffic analysis attacks against encrypted web traffic are a persisting problem. However, there is a large gap between the scientific estimate of attack threats and the real-world situation. As traffic analysis attacks depend on very specific metadata information, they are sensitive to artificial changes in the transmission characteristics. While the advent of deep learning greatly improves the performance rates of traffic analysis attacks on Tor in research settings, deep neural networks are known for being implicitly vulnerable to artifacts in data. Removing artifacts from our experimental setups is essential to minimizing the risk of evaluation bias.

In this work, we study a state-of-the-art end-to-end traffic correlation attack on Tor and propose a novel data collection setup. Our design addresses the key constraint of prior work: instead of using a single proxy node for collecting exit traffic, we deploy multiple proxies. Our extensive analysis shows that in the multi-proxy design (i) end-to-end round-trip times are more realistic than in the original design; and that (ii) traffic correlation attack performance degrades significantly on realistic timings. For a reliable and informative evaluation, we develop a general scientific methodology for replication and comparison of machine and deep-learning attacks on Tor. Our evaluation indicates high relevance of the multi-proxy data collection setup and the novel dataset.

Implementation

To access the source code and the data exactly as used in our research paper, visit our repository with the paper artifacts:

GitHub

Dataset

Our dataset has two formats: original and parsed PCAPs.

  • For directly obtaining parsed PCAPs (preprocessed traces in the CSV format), fill in this form.
  • If interested in the original PCAPs (the entire MongoDB database), reach out to us by email.
Dataset

Contact Us

Feel free to reach out to us with regard to our research, the multi-proxy setup or the dataset.

Background image by Freepik