Study of Information Network Traffic Identification Based on C4.5 Algorithm

The current network traffic identification technology, based on the port of transport layer and the label of application layer protocol of network traffic, has shown some shortcomings which are difficult to be overcome. The author proposed that the C4.5 algorithm could be used in transport layer network traffic identification technologies with engineering practice to solve the above problems. The author has adopted the correlation feature selection (CFS) algorithm and the genetic algorithm (GA) to select the attribute feature subset. The method which combined N-fold cross-validation with testing set was proposed and adopted to assess the classification results of the current broadband network traffic. The experimental results show that network traffic has been successfully identified and analyzed. Average accuracy rate of over 88.67% or 88.89% can be achieved respectively when used subset or full set as a training set.