names and faces.

We show that a large and realistic face dataset can be built from news photographs and their associated captions. Our dataset consists of 44,773 face images, obtained by applying a face nder to approximately half a million captioned news images. This dataset is more realistic than usual face recognition datasets, because it contains faces captured iin the wildi in a variety of congurations with respect to the camera, taking a variety of expressions, and under illumination of widely varying color. Faces are extracted from the images and names from the associated caption. Our system uses a clustering procedure to nd the correspondence between faces and associated names in news picture-caption pairs. The context in which a name appears in a caption provides powerful cues as to whether it is depicted in the associated image. By incorporating simple natural language techniques, we are able to improve our name assignment signicantly . Once the procedure is complete, we have an accurately labeled set of faces, an appearance model for each individual depicted, and a natural language model that can produce accurate results on captions in isolation.