Web-scale semantic information processing

From the earliest days of semantic web research, the problem of semantically processing information at very large scale has cast a shadow over its many successes. Having the Web as a primary use case has been both a blessing and a curse. While targeting the Web has attracted a lot of attention and has significantly improved the awareness for the benefits of semantic data models and automatic reasoning, the field still has to prove that semantics will work on web scale. During the first years, semantic web research was dominated by on the consideration of rich semantic representations in the tradition of symbolic AI. Significant progress was achieved with respect to integrating knowledge representation and reasoning with mainstream web infrastructure leading to key standards such as OWL, RDF and SPARQL, however, processing enormous quantities of the corresponding data is still one of the greatest challenges for the Semantic Web. While other communities, e.g. Information Retrieval, have developed successful strategies for coping with the scale of the Web using statistical techniques, semantic web technologies are still struggling with scaling up to the Web as such. This is in part due to the need to preserve the data’s structure and the need to perform various forms of reasoning in order to more effectively leverage the available information. In order to cope with these challenges it is necessary to look beyond the realms of artificial intelligence and to leverage ideas and techniques from the distributed systems and the database community.

[1]  Eyal Oren,et al.  Sindice.com: Weaving the Open Linked Data , 2007, ISWC/ASWC.

[2]  Heiner Stuckenschmidt,et al.  Peer-to-Peer Reasoning for Interlinked Ontologies , 2010, Int. J. Semantic Comput..

[3]  Frank van Harmelen,et al.  Scalable Distributed Reasoning Using MapReduce , 2009, SEMWEB.

[4]  Yun Peng,et al.  Swoogle: Searching for Knowledge on the Semantic Web , 2005, AAAI.

[5]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.

[6]  Enrico Motta,et al.  Watson: supporting next generation semantic web applications , 2007 .

[7]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[8]  Giovanni Tummarello,et al.  Searching web data: An entity retrieval and high-performance indexing model , 2012, J. Web Semant..

[9]  Jürgen Umbrich,et al.  Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora , 2012, J. Web Semant..

[10]  Manolis Koubarakis,et al.  RDFS Reasoning and Query Answering on Top of DHTs , 2008, SEMWEB.

[11]  Karl Aberer,et al.  GridVine: Building Internet-Scale Semantic Overlay Networks , 2004, SEMWEB.

[12]  Frank van Harmelen,et al.  WebPIE: A Web-scale Parallel Inference Engine using MapReduce , 2012, J. Web Semant..

[13]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[14]  James A. Hendler,et al.  Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples , 2009, SEMWEB.

[15]  Manfred Hauswirth,et al.  Scalable distributed indexing and query processing over Linked Data , 2012, J. Web Semant..

[16]  Yun Peng,et al.  Finding and Ranking Knowledge on the Semantic Web , 2005, SEMWEB.

[17]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[18]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.