This paper reports University of Pittsburgh's participation in GeoCLEF 2008. As the first time participants, we only worked on the monolingual GeoCLEF task and submitted four runs under two different methods. Our GCEC method aims to test the effectiveness of our online geographic coordinate extraction and clustering algorithm, and our WIKIGEO method wants to examine the usefulness of using the geo-coordinate information in Wikipedia for identifying geo-locations. Our experiments results show that: 1) our online geographic coordinate extraction and clustering algorithm is useful for the type of locations that do not have clear corresponding coordinates; 2) the expansion based on the geo-locations generated by GCEC is effectiveness in improving Geographic retrievals. 3) Using Wikipedia we can find the coordinates for many geo-locations, but its usage for query expansion still need further studies. 4) query expansion based on title only obtained better results than using the combination of title and narrative parts, which are thought to contain more related geographic information. Further study is need for this part too.
[1]
Lucian Vlad Lita,et al.
Okinet : Automatic Extraction of a Medical Ontology From Wikipedia
,
2008
.
[2]
Rada Mihalcea,et al.
Using Wikipedia for Automatic Word Sense Disambiguation
,
2007,
NAACL.
[3]
Carol Peters.
Cross-Language Evaluation Forum - CLEF 2006
,
2006
.
[4]
Gerard Salton,et al.
A vector space model for automatic indexing
,
1975,
CACM.
[5]
Xing Xie,et al.
MSRA Columbus at GeoCLEF 2006
,
2006,
CLEF.
[6]
Xing Xie,et al.
Query Parsing Task for GeoCLEF2007 Report
,
2007,
CLEF.
[7]
Mark Steedman,et al.
Example Selection for Bootstrapping Statistical Parsers
,
2003,
NAACL.
[8]
Anton Leuski,et al.
Making MIRACLEs: Interactive translingual search for Cebuano and Hindi
,
2003,
TALIP.