Piction: A System That Uses Captions to Label Human Faces in Newspaper Photographs

It is often the case that linguistic and pictorial information are jointly provided to communicate information. In situations where the text describes salient aspects of the picture, it is possible to use the text to direct the interpretation (i.e., labelling objects) in the accompanying picture. This paper focuses on the implementation of a multi-stage system PICTION that uses captions to identify humans in an accompanying photograph. This provides a computationally less expensive alternative to traditional methods of face recognition. It does not require a pre-stored database of face models for all people to be identified. A key component of the system is the utilisation of spatial constraints (derived from the caption)in order to reduce the number of possible labels that could be associated with face candidates (generated by a face locator). A rule-based system is used to further reduce this number and arrive at a unique labelling. The rules employ spatial heuristics as well as distinguishing characteristics of faces (e.g., male versus female). The system is noteworthy since a broad range of AI techniques are brought to bear (ranging from natural-language parsing to constraint satisfaction and computer vision).

[1]  R. Jackendoff On beyond Zebra: The relation of linguistic and visual information , 1987, Cognition.

[2]  R M Haralick,et al.  The consistent labeling problem: part I. , 1979, IEEE transactions on pattern analysis and machine intelligence.

[3]  Venu Govindaraju,et al.  Locating human faces in newspaper photographs , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  L. Kitchen,et al.  Identification of Human Faces Using Data-Driven Segmentation, Rule-based , 1986 .

[5]  Stuart C. Shapiro,et al.  SNePS Considered as a Fully Intensional Propositional Semantic Network , 1986, AAAI.