The work presented in this paper concerns Information Retrieval from geographical documents, i.e. documents with a major geographic component. The final aim, in response to an informational query of the user, is to return a ranked list of relevant passages in selected documents, allowing text browsing within them. We consider in this paper the spatial component of the texts and the queries. The idea is to perform an off-line linguistic analysis of the document, extracting spatial expressions (i.e. expressions denoting geographical localisations). The point is that such expressions are (in general) much more complex than simple place names. We present a linguistic analyser which recognises them, performing a semantic analysis and computing symbolic representations of their "content". These representations, stored in the text thanks to XML annotation, will act as indexes of passages with which queries are compared. The matching of queries with text expressions is a complex process, needing several kinds of numeric and symbolic computations. A prospective outline of it is described.
[1]
Rolf Schwitter,et al.
ExtrAns, an answer extraction system
,
2000
.
[2]
James P. Callan,et al.
Passage-level evidence in document retrieval
,
1994,
SIGIR '94.
[3]
Michael A. Covington,et al.
GULP 3.1: An extension of Prolog for unification-based grammar
,
1994
.
[4]
Maria T. Pazienza,et al.
Information Extraction
,
2002,
Lecture Notes in Computer Science.
[5]
Bernard Debarbieux.
L'école en France [Hérin R., Rouault R. (1994). Atlas de la France scolaire. De la maternelle au lycée. Paris-Montpellier : La Documentation française-RECLUS]
,
1995
.
[6]
Andrée Borillo,et al.
L'espace et son expression en français
,
1998
.
[7]
Helmut Schmidt,et al.
Probabilistic part-of-speech tagging using decision trees
,
1994
.
[8]
Robert Hérin,et al.
Atlas de la France scolaire : de la maternelle au lycée
,
1994
.
[9]
Patrice Enjalbert,et al.
Passage Extraction in Geographical Documents
,
2003,
IIS.