Transactions on Large-Scale Data- and Knowledge-Centered Systems XIII

The traditional search model of finding links on the Web is unsatisfactory for the increasingly complex tasks that seek to leverage the diverse, increasingly structured and semantically annotated data on the Web. A good example is when users seek to find collections or packages of resources that meet some constraints e.g., a collection of learning resources that cover some topics and have a good average rating or a collection of tourist attractions in a city such that total cost and total travel time for visiting all attractions meet the given constraints. For such queries, the goal is the return a set of constraint-qualifying collections or packages. However, using the traditional “set of links” query paradigm, such queries can only be satisfied by issuing multiple queries, reviewing answer lists and manually assembling packages to suit a user’s desired constraints. In this article, we introduce the concept of a Package Query for querying for resource combinations on the Semantic Web. In particular, we consider a frequent subclass of such queries Skyline Package Queries, in which multiple competing criteria are specified in the query so that the pareto-optimal set or skyline of packages are returned. In contrast to a few recent efforts on package queries on single relational models, finegrained data models such as RDF include the challenge of computing the package skyline over multiple joins of ternary relations. We present four evaluation strategies involving different combinations of relational query operators and a new operator for Skyline Package Queries and different storage models for RDF data. A comparative evaluation of the algorithms over real world and synthetic-benchmark RDF datasets is

[1]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[2]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[3]  Dominic Battré,et al.  Dynamic Knowledge in DHT Based RDF Stores , 2008, SWWS.

[4]  R. Huebsch Content-Based Multicast: Comparison of Implementation Options , 2003 .

[5]  B Praveen Kumar,et al.  Mariposa a Wide-Area Distributed Database System , 2010, ICCA 2010.

[6]  Witold Litwin,et al.  LH*—a scalable, distributed data structure , 1996, TODS.

[7]  Anne-Marie Kermarrec,et al.  Gossiping personalized queries , 2010, EDBT '10.

[8]  Wendy Hui Wang,et al.  Distributed and Secure Access Control in P2P Databases , 2010, DBSec.

[9]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[10]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[11]  D. H. McLain,et al.  Drawing Contours from Arbitrary Data Points , 1974, Comput. J..

[12]  Scott Shenker,et al.  Complex Queries in Dht-based Peer-to-peer Networks , 2002 .

[13]  Chung-Ta King,et al.  Similarity discovery in structured P2P overlays , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[14]  Manolis Koubarakis,et al.  Atlas: Storing, updating and querying RDF(S) data on top of DHTs , 2010, J. Web Semant..

[15]  Evaggelia Pitoura,et al.  Preference-aware publish/subscribe delivery with diversity , 2009, DEBS '09.

[16]  Diego Calvanese,et al.  Data Integration through DL-LiteA Ontologies , 2008, SDKB 2008.

[17]  Manolis Koubarakis,et al.  Information filtering and query indexing for an information retrieval model , 2009, TOIS.

[18]  Stefan Decker,et al.  TRIPLE - A Query, Inference, and Transformation Language for the Semantic Web , 2002, SEMWEB.

[19]  Manolis Koubarakis,et al.  Distributed Evaluation of Continuous Equi-join Queries over Large Structured Overlay Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Manolis Koubarakis,et al.  LibraRing: An Architecture for Distributed Digital Libraries Based on DHTs , 2005, ECDL.

[21]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[22]  Anne-Marie Kermarrec,et al.  Gossiping in distributed systems , 2007, OPSR.

[23]  Laks V. S. Lakshmanan,et al.  TopRecs: Top-k algorithms for item-based collaborative filtering , 2011, EDBT/ICDT '11.

[24]  Patrick Valduriez,et al.  Efficient Early Top-k Query Processing in Overloaded P2P Systems , 2011, DEXA.

[25]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[26]  Manolis Koubarakis,et al.  Xml data dissemination using automata on top of structured overlay networks , 2008, WWW.

[27]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[28]  Marcelo Arenas,et al.  Foundations of schema mapping management , 2010, PODS '10.

[29]  David R. Karger,et al.  Looking up data in P2P systems , 2003, CACM.

[30]  Rajeev Rastogi,et al.  Accelerating Lookups in P2P Systems using Peer Caching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[31]  Gade Krishna,et al.  A scalable peer-to-peer lookup protocol for Internet applications , 2012 .

[32]  Patrick Valduriez,et al.  DHTJoin: processing continuous join queries using DHT networks , 2009, Distributed and Parallel Databases.

[33]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[34]  Haiying Shen Efficient and effective file replication in structured P2P file sharing systems , 2009, 2009 IEEE Ninth International Conference on Peer-to-Peer Computing.

[35]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[36]  Maria Gradinariu Potop-Butucaru,et al.  A Framework for Secure and Private P2P Publish/Subscribe , 2010, SSS.

[37]  Aris M. Ouksel,et al.  Distributed databases and peer-to-peer databases: past and present , 2008, SGMD.

[38]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[39]  Manolis Koubarakis,et al.  Continuous RDF Query Processing over DHTs , 2007, ISWC/ASWC.

[40]  Vijay Gopalakrishnan,et al.  Adaptive replication in peer-to-peer systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[41]  Dan Suciu,et al.  Schema mediation for large-scale semantic data sharing , 2005, The VLDB Journal.

[42]  Verena Kantere,et al.  GrouPeer: Dynamic clustering of P2P databases , 2009, Inf. Syst..

[43]  Jan Chomicki,et al.  Preference formulas in relational queries , 2003, TODS.

[44]  Amin Vahdat,et al.  Efficient Peer-to-Peer Keyword Searching , 2003, Middleware.

[45]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[46]  Karl Aberer,et al.  The chatty web: emergent semantics through gossiping , 2003, WWW '03.

[47]  Laks V. S. Lakshmanan,et al.  Schema mapping and query translation in heterogeneous P2P XML databases , 2010, The VLDB Journal.

[48]  Laura M. Haas,et al.  Clio: Schema Mapping Creation and Data Exchange , 2009, Conceptual Modeling: Foundations and Applications.

[49]  Ronald Fagin Inverting schema mappings , 2007 .

[50]  Hector Garcia-Molina,et al.  The SIFT information dissemination system , 1999, TODS.

[51]  Laks V. S. Lakshmanan,et al.  Breaking out of the box of recommendations: from items to packages , 2010, RecSys '10.

[52]  Robert M. MacGregor,et al.  A subscribable peer-to-peer RDF repository for distributed metadata management , 2004, J. Web Semant..

[53]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[54]  Zhichen Xu,et al.  pFilter: Global Information Filtering and Dissemination , 2002 .

[55]  Karl Aberer,et al.  My3: A highly-available P2P-based online social network , 2011, 2011 IEEE International Conference on Peer-to-Peer Computing.

[56]  Evaggelia Pitoura,et al.  Content-Based Routing of Path Queries in Peer-to-Peer Systems , 2004, EDBT.

[57]  Karl Aberer,et al.  Query-load balancing in structured overlays , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[58]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[59]  Jeff Z. Pan,et al.  Querying the Semantic Web with Preferences , 2006, SEMWEB.

[60]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[61]  Kemafor Anyanwu,et al.  SkyPackage: From Finding Items to Finding a Skyline of Packages on the Semantic Web , 2012, JIST.

[62]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.

[63]  Alon Y. Halevy,et al.  Piazza: mediation and integration infrastructure for Semantic Web data , 2004, J. Web Semant..

[64]  Yiming Hu,et al.  Ferry: A P2P-Based Architecture for Content-Based Publish/Subscribe Services , 2007, IEEE Transactions on Parallel and Distributed Systems.

[65]  David R. Karger,et al.  On the Feasibility of Peer-to-Peer Web Indexing and Search , 2003, IPTPS.

[66]  Angela Bonifati,et al.  Schema mapping verification: the spicy way , 2008, EDBT '08.

[67]  Vassilis Christophides,et al.  RQL: a declarative query language for RDF , 2002, WWW.

[68]  Alvaro A. A. Fernandes,et al.  Parallel Query Processing on the Grid , 2009, Parallel Programming, Models and Applications in Grid and P2P Systems.

[69]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[70]  David R. Karger,et al.  Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems , 2004, IPTPS.

[71]  Felix Naumann,et al.  A research agenda for query processing in large-scale peer data management systems , 2008, Inf. Syst..

[72]  Vijay Gopalakrishnan,et al.  Efficient Peer-To-Peer Searches Using Result-Caching , 2003, IPTPS.

[73]  Dominic Battré,et al.  Towards Parallel Processing of RDF Queries in DHTs , 2009, Globe.

[74]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[75]  Beng Chin Ooi,et al.  MINERVA: Collaborative P2P Search (Demo) , 2005 .

[76]  Kenneth L. Clarkson,et al.  Fast linear expected-time algorithms for computing maxima and convex hulls , 1993, SODA '90.

[77]  Theoni Pitoura,et al.  Replication, Load Balancing and Efficient Range Query Processing in DHTs , 2006, EDBT.