ArUco/Gaze Tracking in Real Environments

The emergence of affordable mobile eye-trackers has allowed to study gaze behavior in real-world environments. However, the gaze mapping from recorded video to a static reference image is a complex and open problem. Finding a reference image within the video frames, i.e., image matching, can give satisfying results, but occluded or overlapped objects are almost impossible to locate using this technique. We suggest using ArUco fiducial markers (and their associated software library available in OpenCV) to map gaze to dynamic Areas Of Interest (AOIs) within a reference image. Although such markers have been used previously, technical details of marker detection and mapping have been sparse. The current approach consists of three steps: (1) define an AOI using markers, then (2) resolve any conflict among overlapping AOIs, and (3) map the gaze point to the reference image. A dynamic AOI can be defined using one or more corner markers. When camera rotations are limited and the object is relatively orthogonal to the camera, it is possible to define an AOI using only one corner marker. When the camera rotates, its pose estimation is required to project corner points to the camera image plane. An AOI can also be defined with four corner markers, which has the advantage of robustness with respect to camera rotations, and no a priori required knowledge of the physical dimensions of the object. The two approaches can be combined, e.g., when using four corner markers and one of the markers is most (due to occlusion or view angle), the basis vectors can be used to interpolate the position of the lost marker. When two or more AOIs overlap and all the markers are tracked, gaze should be marked on the AOI closer to the camera. The distance to an object can be defined knowing the length of the object, the number of pixels spanned on the image and the pre-computed camera focal parameter. Once the AOIs are defined and marker overlaps are resolved, the gaze point can be mapped to the coordinates of the reference image using homography.