Relation between Agreement Measures on Human Labeling and Machine Learning Performance : Results from an Art History Image Indexing Domain

The domain of digital images and texts we focus on parallels the ARTstor Art HistorySurvey Collection (AHSC), a Mellon funded collection of 4,000 images. The AHSC is based on thirteen standard art history survey texts, thus there is a strong correlation between the images and these texts. The AHSC images all have tombstone metadata (e.g., the name of the work, the artist, date, the location of the work), but few have subject matter metadata. We are currently using two of the texts from the AHSC concordance that we scanned and encoded in TEI-Lite (http://www.teic.org/Lite/teiu5 split en.html).