In the age of network globalization, a monotonically increasing amount of traffic data is produced continuously every day. Capturing HTTP traffic fast and accurately has become a valuable issue. In this paper, we analyzed Hypertext Transfer Protocol (HTTP) and found out patterns of its structure. Then we designed a data acquiring system with an advanced trie-based protocol parsing algorithm. Based on trie, this system can extract user-defined fields without mismatch or backtracking. According to characteristics of the specific HTTP protocol structure, we divided protocol fields into two classes and processed them respectively. When storing protocol fields, we chose to store the starting address and the offset of the required fields instead of storing all the string in memory. Thus we increased efficiency and saved memory in the meanwhile. HTTP traffic data was captured from a commercial Internet service provider with our own data acquiring system. In experiments, the proposed algorithm was demonstrated to be correct, efficient and extensible through the comparison with traditional HTTP protocol parsing method.
[1]
Zhang Bingqi.
Design and Implementation of HTTP Stream Parsing System
,
2005
.
[2]
Roy T. Fielding,et al.
Hypertext Transfer Protocol - HTTP/1.1
,
1997,
RFC.
[3]
Anat Bremler-Barr,et al.
Accelerating Multipattern Matching on Compressed HTTP Traffic
,
2012,
IEEE/ACM Transactions on Networking.
[4]
Edward Fredkin,et al.
Trie memory
,
1960,
Commun. ACM.
[5]
Alireza Yari,et al.
A new approach for multi-pattern string matching in large text corpora
,
2014,
7'th International Symposium on Telecommunications (IST'2014).
[6]
Fang Liu,et al.
Characterizing HTTP Traffic of Mobile Internet Services in Provincial Network
,
2014,
2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics.
[7]
Tiecheng Song,et al.
A High-Performance URL Lookup Engine for URL Filtering Systems
,
2010,
2010 IEEE International Conference on Communications.
[8]
T. Miyajima,et al.
PF-ring and PF-AR operational status
,
2007,
2007 IEEE Particle Accelerator Conference (PAC).