o The Graduate University for Advanced Studies, # National Institute oflnformatics,

With the advent of eye gaze tracking technology, eye gaze is increas­ ingly being used as a media interaction trigger in a variety of appli­ cations, such as eye typing, video content customization, and net­ work video streaming based on region-of-intere st (ROI). The reac­ tion time of a gaze-based networked system, however, is in practice lower-bounded by the round trip time (RTT) of today's networks, which can be large. To improve the efficacy of gaze-based net­ worked systems, in the paper we propose a Hidden Markov Model (HMM)-based gaze prediction strategy to predict future gaze loca­ tions to lower end-to-end reaction delay. We first design an HMM with three states corresponding to human's three major types of in­ trinsic eye movements. HMM parameters are obtained offiine on a per-video basis during training phase. During testing phase, a win­ dow of noisy gaze observations are collected in real-time as input to a forward algorithm, which computes the most likely HMM state. Given the deduced HMM state, linear prediction is used to predict gaze location RTT seconds into the future. We demonstrate the applicability of our gaze prediction strategy by focusing on ROI-based bit allocation for network video stream­ ing. To reduce transmission rate of a video stream without degrading viewer's perceived visual quality, we allocate more bits to encode the viewer's current spatial ROI, while devoting fewer bits in other spa­ tial regions. The challenge lies in overcoming the delay between the time a viewer's ROI is detected by gaze tracking, to the time the effected video is encoded, delivered and displayed at the viewer's terminal. To this end, we use our proposed gaze-prediction strategy to predict future eye gaze locations, so that optimized bit allocation can be performed for future frames. Our experiments show that bit rate can be reduced by 21% without noticeable visual quality degra­ dation when end-to-end network delay is as high as 200ms. Index Terms-Eye-gaze prediction, network streaming