The University of Amsterdam at VideoCLEF 2008

The University of Amsterdam (UAms) team carried out the Vid2RSS classification task, the primary sub-task of the VideoCLEF track at CLEF 2008. This task involves the assignment of thematic category labels to dual language (Dutch/English) television episode videos. UAms chose to focus on exploiting archival metadata and speech transcripts generated by both the Dutch and English speech recognizers. Exploratory experimentation completed prior to the start of the task on external data motivated choosing a Support Vector Machine (SVM) with a linear kernel as the classifier. As a SVM toolbox to carry out the experiments, the Least Square-SVM (LS-SVM) toolbox was selected. Wikipedia was chosen as the source of the training data because it is multilingual and contains content with broad thematic coverage. The results of the experimentation showed that archival metadata improves performance of classification, but the addition of speech recognition transcripts in one or both languages does not yield performance gains. Although the overall performance of the classifiers was less than satisfactory, adequate performance was achieved in several classes, suggesting that there is concrete potential for future work to achieve performance improvements, especially if more suitable training data could be obtained.