Sharing Representations for Long Tail Computer Vision Problems

The long tail phenomena appears when a small number of objects/words/classes are very frequent and thus easy to model, while many many more are rare and thus hard to model. This has always been a problem in machine learning. We start by explaining why representation sharing in general, and embedding approaches in particular, can help to represent tail objects. Several embedding approaches are presented, in increasing levels of complexity, to show how to tackle the long tail problem, from rare classes to unseen classes in image classification (the so-called zero-shot setting). Finally, we present our latest results on image captioning, which can be seen as an ultimate rare class problem since each image is attributed to a novel, yet structured, class in the form of a meaningful descriptive sentence.