Accessing existing distributed science archives as RDF models

Due to the ease with which information can be published on the Internet, scientists potentially haveaccess to more data than at any time in history. The Internet makes it viable for research laboratoriesto publish the results of their experiments for others to use, as text les, xml documents, or a database.There are also numerous archive centres which gather information from various sources and make itavailable from one place, often as a query endpoint to a relational database due to the size and highlystructured nature of the data. In order to access this data, a scientist must be able to: (i) locate thosedata sources with information relevant to their research; (ii) understand the data model of each relevantdata source in order to compose a query to extract the required data.The resource description framework (rdf) [5] has been developed by the W3C with the aim of sharingand linking data on the Web. rdf is a graph based data model that makes the semantics of the dataexplicit, and can be queried using the sparql query language [6]. One of the bene ts often stated forrdf is the ease with which data can be integrated from distributed rdf sources. To take advantage ofthis feature of rdf, tools for exposing relational databases as virtual rdf graphs have been developedincluding D2R [2] and SquirrelRDF [7]. In this work, we have considered these tools for exposing relationaldatabases as sparql endpoints in the context of astronomical data archives.There are many astronomical data archives available, each of which stores data according to their ownrelational schema. While work on developing a consensus model for accessing these archives is underwaywithin the International Virtual Observatory Alliance (ivoa)