Towards systematic traffic annotation

Maintaining Internet network resources available and se-cured is an unmet challenge. Hence, trac classi cationand anomaly detection received much attention in the lastfew years, and several algorithms have been proposed forbackbone trac. However, the evaluation of these methodsusually lacks rigor, leading to hasty conclusions. Since syn-thetic data is rather criticized and common labeled database(like the data sets from the DARPA Intrusion DetectionEvaluation Program [6]) is not available for backbone traf- c; researchers analyze real data and validate their methodsby manually inspecting their results, or by comparing theirresults with other methods. Our nal goal is to label theMAWI database [2] which is an archive of real backbonetrac traces publicly available. Since manual labeling ofbackbone trac is unpractical, we build this database bycross-validating results from several methods with di erenttheoretical backgrounds. This systematic approach permitsto maintain updated database in which recent trac tracesare regularly added, and labels are improved with upcomingalgorithms. In this paper we discuss the diculties facedin comparing events provided by distinct algorithms, andpropose a methodology to achieve it.This work will also help researchers in understanding re-sults from their algorithms. For instance, while developinganomaly detector, researchers commonly face a problem intuning their parameter set. The correlation between ana-lyzed trac and parameter set is complicated. Therefore,researchers usually run their application with numerous pa-rameter settings, and the best parameter set is selected bylooking at the highest detection rate. Although this processis commonly accepted by the community a crucial issue stillremains. Let say a parameter set A gives a similar detec-tion rate than a parameter set B , but a deeper analysis ofreported events shows that B is more e ective for a certainkind of anomalies not detectable with the parameter set A(and vice versa). This case is important and should notbe ignored, however, it cannot be observed with a simplecomparison of detection rate.