The University of Amsterdam at WePS2

In this paper we describe our participation in the Second Web People Search workshop (WePS2) and detail our approaches. For the clustering task, our focus was on replicating the lessons learned at WEPS1 on the data set made available as part of WEPS2 and on experimenting with a voting-based combination of clustering methods. We found that clustering methods display the same overall behavior on the WEPS1 and WESP2 data sets and that a hierarchical clustering approach delivers the best performance, even outperforming voting-based combinations. For attribute extraction, we explore approaches using pattern matching with manually and automatically constructed patterns. Manual patterns were constructed using expert knowledge and following analysis of sample data. Automatic pattern construction extracts textual and syntactic context around training samples and selects patterns which are expected to perform well based on leave-one-out evaluation. Experimental results show that manually constructed patterns are very effective for obtaining high recall. For automatically extracted patterns performance varied widely depending on the attribute type. Larger amounts of training data may help improve these approaches in the future.