Visual tracking via ensemble autoencoder

The authors present a novel online visual tracking algorithm via ensemble autoencoder (AE). In contrast to other existing deep model based trackers, the proposed algorithm is based on the theory that the image resolution has an influence on vision procedures. When the authors employ a deep neural network to represent the object, the resolution is corresponding to the network size. The authors apply a small network to represent the pattern in a relatively lower resolution and search the object in a relatively larger area of the neighbourhood. After roughly estimating the location of the object, the authors apply a large network, which can provide more detailed information, to estimate the state of the object more accurately. Thus, the authors employ a small AE mainly for position searching and a larger one mainly for scale estimating. When tracking an object, the two networks interact to operate under the framework of particle filtering. Extensive experiments on the benchmark dataset show that the proposed algorithm performs favourably compared with some state-of-the-art methods.