Generic XML-based framework for metadata portals

We present a generic and flexible framework for building geoscientific metadata portals independent of content standards for metadata and protocols. Data can be harvested with commonly used protocols (e.g., Open Archives Initiative Protocol for Metadata Harvesting) and metadata standards like DIF or ISO 19115. The new Java-based portal software supports any XML encoding and makes metadata searchable through Apache Lucene. Software administrators are free to define searchable fields independent of their type using XPath. In addition, by extending the full-text search engine (FTS) Apache Lucene, we have significantly improved queries for numerical and date/time ranges by supplying a new trie-based algorithm, thus, enabling high-performance space/time retrievals in FTS-based geo portals. The harvested metadata are stored in separate indexes, which makes it possible to combine these into different portals. The portal-specific Java API and web service interface is highly flexible and supports custom front-ends for users, provides automatic query completion (AJAX), and dynamic visualization with conventional mapping tools. The software has been made freely available through the open source concept.

[1]  Edward A. Fox,et al.  Inverted Files , 1992, Information Retrieval: Data Structures & Algorithms.

[2]  Rene De La Briandais File searching using variable length keys , 1959, IRE-AIEE-ACM Computer Conference.

[3]  Paul A. Longley,et al.  The emergence of geoportals and their role in spatial data infrastructures , 2005, Comput. Environ. Urban Syst..

[4]  Soraya Abad-Mota Databases and Portals for Knowledge Management , 2001 .

[5]  Norbert Lossau Search Engine Technology and Digital Libraries: Libraries Need to Discover the Academic Internet , 2004, D Lib Mag..

[6]  Johann van Reenen Digital Libraries and Virtual Workplaces: Important Initiatives for Latin America in the Information Age , 2001 .

[7]  Douglas D. Nebert Building a geospatial data clearinghouse for data discovery and access , 2000 .

[8]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[9]  M. Goodchild,et al.  Sharing Geographic Information: An Assessment of the Geospatial One-Stop , 2007 .

[10]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[11]  Herbert Van de Sompel,et al.  Resource Harvesting within the OAI-PMH Framework , 2004, D Lib Mag..

[12]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[13]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[14]  Michael Diepenbroek,et al.  Webservices Infrastructure for the Registration of Scientific Primary Data , 2005, ECDL.

[15]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[16]  Wolfgang Kresse,et al.  ISO Standards for Geographic Information , 2010 .

[17]  M. Diepenbroek,et al.  PANGAEA: an information system for environmental sciences , 2002 .