View definition languages for biomedical rdf ontologies

The biological and biomedical communities have been actively making ontologies, thesauri, and data sets available on the semantic web. Researchers and applications would like to leverage these information sets. In accessing these information sets, users would like to specify the exact content that they are interested in and how it should be transformed. At the same time, users would like to be able to access the most recent data, without having to frequently acquire and transform it. In this dissertation, we support these researchers by developing solutions, inspired by database views, for reusing these information sets. We do this by defining a set of use cases for reusing information sets based upon concrete views and functionality requests made by researchers. For this work, we have defined nine use cases over four different biomedical ontologies. We use this set of use cases to define the functionality necessary in a general solution for RDF information set reuse. Our view definition language, vSPARQL, allows applications to specify the exact content that they are interested in and how that content should be restructured or modified. Applications can access relevant content by querying against these view definitions. The users interested in these information sets are frequently not computer scientists and find the syntax of the RDF query language SPARQL, and declarative view definition languages based on it, prohibitive. To address this difficulty, we have defined an ontology transformation language IML consisting of a small number of graph transformations which can be composed in a dataflow style to define a view over RDF-based information sets. The language's operations closely map to the manipulations users undertake when manipulating and transforming RDF datasets using a visual editor. Finally, we have developed a query rewriting engine that supports querying over non-materialized IML view definitions. Through a set of rule and statistics based optimizations, we are able to significantly reduce the evaluation time for a number of queries over our user-requested views.