Geo-indexed object recognition for mobile vision tasks

The presented work settles attention in the architecture of ambient intelligence, in particular, for the application of mobile vision tasks in multimodal interfaces. A major issue for the performance of these services is uncertainty in the visual information which roots in the requirement to index into a huge amount of reference images. The presented functional component -- the Attentive Machine Interface (AMI) -- enables contextual processing of multi-sensor information in a probabilistic framework, for example to exploit contextual information from geo-services with the purpose to cut down the visual search space into a subset of relevant object hypotheses. We demonstrate results about geo-indexed object recognition from experimental tracks and image captures in an urban scenario, extracting object hypotheses in the local context from both (i) mobile image based appearance and (ii) GPS based positioning, and verify performance in recognition accuracy (> 14%) using Bayesian decision fusion, verifying the advantage of multi-sensor attentive processing in multimodal interfaces.