Event Alignment for Cross-Media Feature Extraction in the Football Domain

This paper describes an experiment in creating cross-media descriptors from football-related text and videos. We used video analysis results and combined them with several textual resources - both semi- structured (tabular match reports) and unstructured (textual minute-by-minute match reports). Our aim was to discover the relations among six video data detectors and their behavior during a time window that corresponds to an event described in the textual data. The experiment shows how football events extracted from text can be mapped to corresponding scenes in video and how this may help in extracting event-specific video detectors.

[1]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[2]  Debra T. Burhans,et al.  Visual Semantics: Extracting Visual information from Text Accompanying Pictures , 1994, AAAI.

[3]  Noel E. O'Connor,et al.  Event detection in field sports video using audio-visual features and a support vector Machine , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Thierry Declerck,et al.  Event-Coreference across Multiple, Multi-lingual Sources in the Mumis Project , 2003, EACL.