YARS2: A Federated Repository for Querying Graph Structured Data from the Web

We present the architecture of an end-to-end semantic search engine that uses a graph data model to enable interactive query answering over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing methods for graph-structured data and parallel query evaluation methods on a cluster of computers. We evaluate the system on a dataset with 430 million statements collected from the Web, and provide scale-up experiments on 7 billion synthetically generated statements.

[1]  Jürgen Umbrich,et al.  MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data , 2006, SEMWEB.

[2]  Philip A. Bernstein,et al.  Power of Natural Semijoins , 1981, SIAM J. Comput..

[3]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[4]  Aidan Hogan,et al.  ReConRank: A Scalable Ranking Method for Semantic Web Data with Context , 2006 .

[5]  Vincent Y. Lum,et al.  Multi-attribute retrieval with combined indexes , 1970, Commun. ACM.

[6]  Yun Peng,et al.  Swoogle: A semantic web search and metadata engine , 2004, CIKM 2004.

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Jennifer Widom,et al.  Database System Implementation , 2000 .

[9]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[10]  Eric Brewer,et al.  Combining Systems and Databases: A Search Engine Retrospective , 2004 .

[11]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[12]  David F. Wood,et al.  Kowari: A Platform for Semantic Web Storage and Analysis , 2005, WWW 2005.

[13]  Dieter Fensel,et al.  Unifying Reasoning and Search to Web Scale , 2007, IEEE Internet Computing.

[14]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[15]  Jeff Heflin,et al.  An Evaluation of Knowledge Base Systems for Large OWL Datasets , 2004, SEMWEB.

[16]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[17]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[18]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[19]  Claudio Gutiérrez,et al.  Querying RDF Data from a Graph Database Perspective , 2005, ESWC.

[20]  Andreas Harth,et al.  Performing Object Consolidation on the Semantic Web Data Graph , 2007, I3.

[21]  Heiner Stuckenschmidt,et al.  Index structures and algorithms for querying distributed RDF repositories , 2004, WWW '04.

[22]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[23]  Sriram Raghavan,et al.  WebBase: a repository of Web pages , 2000, Comput. Networks.

[24]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[25]  Dominic Battré,et al.  Load-balancing in P2P based RDF stores , 2006 .

[26]  Min Cai,et al.  RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network , 2004, WWW '04.

[27]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.