Log analysis in a HTTP proxy server for accurately estimating web QoE

The users' perceived quality of web page browsing, so-called “Web QoE”, is becoming an important consideration for mobile network operators. The means by which operators can increase their customer base is shifting from ensuring high network quality of service (QoS) in terms of throughput to improving the quality of experience (QoE) of their users of their networks. They hence need to estimate the Web QoE from a vast number of logs stored on their network equipment, e.g., HTTP proxy servers. Generally, HTTP proxy servers record connection logs not on a per web access basis but rather on a per HTTP connection basis. Moreover, a single web access typically consists of multiple HTTP connections. Because of that, mobile network operators need to estimate web sessions from a lot of HTTP connection logs in a HTTP proxy server. To estimate web sessions, earlier studies took the following three approaches: (1) content type based, (2) time based, and (3) mixed. These approaches, however, inaccurately estimate (misestimate) web sessions in some cases. When a user accesses multiple web pages in a short time, these approaches may not distinguish which HTTP sessions compose a single web page and thus they may aggregate multiple web sessions as a single session. As a result, the estimation accuracy of web sessions decreases, and the estimation accuracy of Web QoE correspondingly decreases. In this paper, to more accurately estimate web sessions, we focus on the number of HTTP sessions in misestimated web sessions and propose a method for detecting erroneous estimations of web sessions that is based on statistical hypothesis testing. An experiment conducted on an operational LTE network showed that our method can decrease the mean absolute error of web session estimation by 0.09 point from that of the conventional method. Moreover, our method can get within 0.03 point of the estimation accuracy limit.