The Archaeotools project: faceted classification and natural language processing in an archaeological context

This paper describes ‘Archaeotools’, a major e-Science project in archaeology. The aim of the project is to use faceted classification and natural language processing to create an advanced infrastructure for archaeological research. The project aims to integrate over 1×106 structured database records referring to archaeological sites and monuments in the UK, with information extracted from semi-structured grey literature reports, and unstructured antiquarian journal accounts, in a single faceted browser interface. The project has illuminated the variable level of vocabulary control and standardization that currently exists within national and local monument inventories. Nonetheless, it has demonstrated that the relatively well-defined ontologies and thesauri that exist in archaeology mean that a high level of success can be achieved using information extraction techniques. This has great potential for unlocking and making accessible the information held in grey literature and antiquarian accounts, and has lessons for allied disciplines.

[1]  A. K. Lambers Posluschny Thinking Outside the Search Box: The Common Information Environment and Archaeobrowser , 2008 .

[2]  A PROFESSIONAL MOCKERY , 2008 .

[3]  Kenneth A. Ross,et al.  A Faceted Query Engine Applied to Archaeology , 2005, VLDB.

[4]  Cynthia L. Gregory INTERNET RESOURCES: Book arts on the Web: An introduction to selected resources , 2004 .

[5]  Richard Bradley,et al.  Bridging the Two Cultures – Commercial Archaeology and the Study of Prehistoric Britain , 2006, The Antiquaries Journal.

[6]  Siegfried Handschuh,et al.  Semantic annotation for knowledge management: Requirements and a survey of the state of the art , 2006, J. Web Semant..

[7]  Jonathan A Javitch,et al.  Finding needles in haystacks , 2004, Nature Biotechnology.

[8]  Fabio Ciravegna,et al.  When ontology and reality collide: the Archaeotools project, facetted classification and natural language processing in an archaeological context , 2008 .

[9]  Gail Falkingham A Whiter Shade of Grey: A new approach to archaeological grey literature using the XML version of the TEI Guidelines , 2005 .

[10]  Julian D. Richards,et al.  Stepping back from the trench edge: an archaeological perspective on the devleopment of standards for recording and publication , 2008 .

[11]  Eric Ashby Bridging the two cultures , 1978, Nature.

[12]  Stuart Jeffrey,et al.  Thinking Outside the Search Box : The Common Information Environment and Archaeobrowser , 2008 .

[13]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[14]  A. Amrani,et al.  A Chain of Text-mining to Extract Information in Archaeology , 2008, 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications.

[15]  B. Mathews INTERNET RESOURCES: Gray literature: Resources for locating unpublished research , 2004 .

[16]  Douglas E. Appelt,et al.  Introduction to Information Extraction Technology , 1999, IJCAI 1999.