Method for separating unknown single protocol data stream into different types of data frames
暂无分享,去创建一个
The invention discloses a method for separating an unknown single protocol data stream into different types of data frames. Cutting is carried out on the data frames by utilizing a n-gram technology, according to the Zipf distribution, a n value which is closest to a straight line in a curve chart is selected to be a desired value, filtering is carried out on non-frequent bytes by utilizing a Jaccard parameter, by changing different threshold values, an optimal solution is obtained, filtering is carried out on the n-gram, and a n-gram set in which the frequency of occurrence is greater than the threshold values is obtained; by utilizing a non-supervision characteristic, an algorithm is selected, a characteristic string set is extracted, during the characteristic selection, a characteristic candidate set which is obtained in the last step is regarded as the input, according to a maximal correlation-minimum redundancy characteristic selection standard, a better representation of the characteristics of different types of messages of a protocol can obtained, and the representation is regarded as a characteristic vector to be used in a cluster; by utilizing a cluster algorithm, identification on the protocol messages is achieved, and the messages with the same formats are clustered together. Assessment is carried out on the method on an ICMP, the accurate rate and recall rate for message identification can reach over 90 %.