Galaxy: A Platform for Explorative Analysis of Open Data Sources

A large volume of Open Data is being generated on a continuous basis. Examples of this are the case of social, natural, and information systems such as World Wide Web and social networks. Most entities and objects in the Open Data are interconnected, forming a complex, semi-structured, and information-rich networks. In this sense, Linked Open Data has the potential to be similar to a federated database. Since Linked Open Data is based on W3C standards, it is possible to implement a federation infrastructure, however, the current SPARQL standard makes it challenging to analyze the Open Data in an explorative manner. Consequently, it will be hard to discover the hidden knowledge in the relationships among entities in Open Data sources. In this paper, we present Galaxy, a platform for explorative analysis of Open Data Sources. Galaxy facilitates the analysis of Open Data graphs based on simple abstractions, i.e. folders and paths, which enable an analyst to group related entities in the graph or nd paths among entities. Galaxy uses Hadoop data processing platforms to store and retrieve large numbers of RDF triples and to support cost-eective and Web-scale processing of Semantic Web data through a Folder-Path enabled extension of SPARQL.