Workload Matters: A Robust Approach to Physical RDF Database Design

Recent advances in Information Extraction, Linked Data Management and the Semantic Web have led to a rapid increase in both the volume and the variety of publicly available graph-structured data. As more and more businesses start to capitalize on graphstructured data, data management systems are being exposed to workloads that are far more diverse and dynamic than what they were designed to handle. In particular, most systems rely on a workload-oblivious physical layout with a fixed-schema and are adaptive only if the changes in the schema are minor. Thus, they are unable to perform consistently well across different types of workloads. This thesis introduces fundamental techniques for supporting diverse and dynamic workloads in RDF data management systems. Instead of assuming anything about the workload upfront, these techniques allow systems to adjust their physical designs as queries are executed. This includes changing the way (i) records are clustered in the storage system, (ii) data are organized and indexed, and (iii) queries are optimized, all at runtime. The thesis proceeds with a discussion of the challenges that have been encountered in implementing these ideas in a proof-of-concept prototype called chameleon-db, and it concludes with a thorough experimental evaluation.

[1]  Peng Peng,et al.  Processing SPARQL Queries Over Linked Data-A Distributed Graph-based Approach , 2014, ArXiv.

[2]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[3]  W. Reed The Normal-Laplace Distribution and Its Relatives , 2006 .

[4]  Michael Hausenblas,et al.  Exploiting Linked Data to Build Web Applications , 2009, IEEE Internet Computing.

[5]  N. Shadbolt,et al.  4store: The Design and Implementation of a Clustered RDF Store , 2009 .

[6]  Z. Meral Özsoyoglu,et al.  RBench: Application-Specific RDF Benchmarking , 2015, SIGMOD Conference.

[7]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[8]  Xuemin Lin,et al.  Keyword search on structured and semi-structured data , 2009, SIGMOD Conference.

[9]  François Goasdoué,et al.  RDFViewS: a storage tuning wizard for RDF applications , 2010, CIKM '10.

[10]  Gerhard Weikum,et al.  Language-model-based ranking for queries on RDF-graphs , 2009, CIKM.

[11]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[12]  Martin L. Kersten,et al.  MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[13]  Sam Lightstone,et al.  Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more , 2007 .

[14]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[15]  K. French,et al.  Expected stock returns and volatility , 1987 .

[16]  Yon Dohn Chung,et al.  SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data , 2009, CIKM.

[17]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[18]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[19]  Lei Zou,et al.  gStore: a graph-based SPARQL query engine , 2014, The VLDB Journal.

[20]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[21]  Hamid Pirahesh,et al.  Cost-based optimization for magic: algebra and implementation , 1996, SIGMOD '96.

[22]  Rik Van de Walle,et al.  Query Execution Optimization for Clients of Triple Pattern Fragments , 2015, ESWC.

[23]  Rada Chirkova,et al.  Materializing views with minimal size to answer queries , 2003, PODS '03.

[24]  Rik Van de Walle,et al.  Querying Datasets on the Web with High Availability , 2014, SEMWEB.

[25]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[26]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[27]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[28]  Ioana Manolescu,et al.  RDF in the clouds: a survey , 2014, The VLDB Journal.

[29]  M. Tamer Özsu,et al.  Clustering RDF Databases Using Tunable-LSH , 2015, ArXiv.

[30]  Guy M. Lohman,et al.  R* optimizer validation and performance evaluation for local queries , 1986, SIGMOD '86.

[31]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[32]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[33]  Ling Liu,et al.  Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning , 2013, Proc. VLDB Endow..

[34]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[35]  Martin Theobald,et al.  TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing , 2014, SIGMOD Conference.

[36]  Günes Aluç,et al.  Parametric Plan Caching Using Density-Based Clustering , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[37]  Marcelo Arenas,et al.  On the Semantics of SPARQL , 2009, Semantic Web Information Management.

[38]  Georg Lausen,et al.  Map-Side Merge Joins for Scalable SPARQL BGP Processing , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[39]  J. Carroll,et al.  Jena: implementing the semantic web recommendations , 2004, WWW Alt. '04.

[40]  Pablo de la Fuente,et al.  An Empirical Study of Real-World SPARQL Queries , 2011, ArXiv.

[41]  Rik Van de Walle,et al.  Initial Usage Analysis of DBpedia's Triple Pattern Fragments , 2015, USEWOD-PROFILES@ESWC.

[42]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[43]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[44]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[45]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[46]  Matthew Chalmers,et al.  Fast Multidimensional Scaling Through Sampling, Springs and Interpolation , 2003, Inf. Vis..

[47]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[48]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[49]  Kjell Bratbergsengen,et al.  Hashing Methods and Relational Algebra Operations , 1984, VLDB.

[50]  M. Carter Computer graphics: Principles and practice , 1997 .

[51]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[52]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[53]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[54]  Medha Atre,et al.  Left Bit Right: For SPARQL Join Queries with OPTIONAL Patterns (Left-outer-joins) , 2013, SIGMOD Conference.

[55]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.

[56]  Mohamed Ziauddin,et al.  Materialized Views in Oracle , 1998, VLDB.

[57]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[58]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[59]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[60]  Dimitrios Tsoumakos,et al.  Graph-Aware, Workload-Adaptive SPARQL Query Caching , 2015, SIGMOD Conference.

[61]  Orri Erling,et al.  Virtuoso, a Hybrid RDBMS/Graph Column Store , 2012, IEEE Data Eng. Bull..

[62]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[63]  Steffen Staab,et al.  SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data , 2012, SEMWEB.

[64]  Per-Åke Larson,et al.  Linear hashing with overflow-handling by linear probing , 1985, TODS.

[65]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[66]  Per-Åke Larson,et al.  Linear hashing with separators—a dynamic hashing scheme achieving one-access , 1988, TODS.

[67]  E. Krause Taxicab Geometry: An Adventure in Non-Euclidean Geometry , 1987 .

[68]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[69]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[70]  Ioannis Konstantinou,et al.  H2RDF: adaptive query processing on RDF data in the cloud. , 2012, WWW.

[71]  François Goasdoué,et al.  View Selection in Semantic Web Databases , 2011, Proc. VLDB Endow..

[72]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[73]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[74]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[75]  Surajit Chaudhuri,et al.  To tune or not to tune?: a lightweight physical design alerter , 2006, VLDB.

[76]  Charu C. Aggarwal,et al.  A Survey of Stream Clustering Algorithms , 2018, Data Clustering: Algorithms and Applications.

[77]  Bu-Sung Lee,et al.  From Linked Data to Relevant Data -- Time is the Essence , 2011, ArXiv.

[78]  HyeongSik Kim,et al.  Algebraic Optimization for Processing Graph Pattern Queries in the Cloud , 2013, IEEE Internet Computing.

[79]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[80]  Panagiotis Papapetrou,et al.  Nearest Neighbor Retrieval Using Distance-Based Hashing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[81]  Goetz Graefe,et al.  Multi-table joins through bitmapped join indices , 1995, SGMD.

[82]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[83]  Mohammed J. Zaki,et al.  GRAIL , 2010, Proc. VLDB Endow..

[84]  Ashish Gupta,et al.  Materialized views: techniques, implementations, and applications , 1999 .

[85]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[86]  Marcelo Arenas,et al.  Semantics and Complexity of SPARQL , 2006, International Semantic Web Conference.

[87]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[88]  Panos Kalnis,et al.  Efficient and accurate nearest neighbor and closest pair search in high-dimensional space , 2010, TODS.

[89]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[90]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[91]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[92]  Rüdiger Schollmeier,et al.  A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications , 2001, Proceedings First International Conference on Peer-to-Peer Computing.

[93]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[94]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[95]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[96]  Piero Fraternali,et al.  Graph Search of Software Models Using Multidimensional Scaling , 2015, EDBT/ICDT Workshops.

[97]  Marcelo Arenas,et al.  Querying semantic web data with SPARQL , 2011, PODS.

[98]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[99]  Julian Dolby,et al.  Building an efficient RDF store over a relational database , 2013, SIGMOD '13.

[100]  Sungpack Hong,et al.  Taming Subgraph Isomorphism for RDF Query Processing , 2015, Proc. VLDB Endow..

[101]  Siegfried Handschuh,et al.  Learning from Linked Open Data Usage: Patterns & Metrics , 2010 .

[102]  Witold Litwin,et al.  Linear Hashing: A new Algorithm for Files and Tables Addressing , 1980, ICOD.

[103]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[104]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[105]  Padmashree Ravindra,et al.  Scaling Unbound-Property Queries on Big RDF Data Warehouses using MapReduce , 2015, EDBT.

[106]  George Karypis,et al.  METIS and ParMETIS , 2011, Encyclopedia of Parallel Computing.

[107]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[108]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[109]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[110]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[111]  Sebastian Ebers,et al.  Efficient processing of SPARQL joins in memory by dynamically restricting triple patterns , 2009, SAC '09.

[112]  V. S. Subrahmanian,et al.  DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases , 2009, SEMWEB.

[113]  Christian Bizer Web of Linked Data - A global public data space on the Web. , 2010, WebDB 2010.

[114]  Wei Jin,et al.  SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs , 2010, Proc. VLDB Endow..

[115]  Ulf Leser,et al.  Selecting Materialized Views for RDF Data , 2010, ICWE Workshops.

[116]  Jürgen Umbrich,et al.  YARS2: A Federated Repository for Querying Graph Structured Data from the Web , 2007, ISWC/ASWC.

[117]  Panos Kalnis,et al.  PHD-Store: An Adaptive SPARQL Engine with Dynamic Partitioning for Distributed RDF Repositories , 2014, ArXiv.

[118]  M. Tamer Özsu,et al.  Linked Data query processing , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[119]  Hai Jin,et al.  TripleBit: a Fast and Compact System for Large Scale RDF Data , 2013, Proc. VLDB Endow..

[120]  Katja Hose,et al.  WARP: Workload-aware replication and partitioning for RDF , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[121]  Roland H. C. Yap,et al.  Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores , 2012, Proc. VLDB Endow..

[122]  Gerhard Weikum,et al.  x-RDF-3X , 2010, Proc. VLDB Endow..

[123]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[124]  David J. DeWitt,et al.  Materialization Strategies in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[125]  Roi Blanco,et al.  Keyword search over RDF graphs , 2011, CIKM '11.

[126]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[127]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[128]  Katja Hose,et al.  Partout: a distributed engine for efficient RDF processing , 2012, WWW.

[129]  Tom White Hadoop - The Definitive Guide: MapReduce for the Cloud , 2009 .

[130]  Shamkant B. Navathe,et al.  Distribution Design of Logical Database Schemas , 1983, IEEE Transactions on Software Engineering.

[131]  Medha Atre OptBitMat: For SPARQL OPTIONAL (left-outer-join) queries , 2013, ArXiv.

[132]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[133]  Georg Lausen,et al.  Sempala: Interactive SPARQL Query Processing on Hadoop , 2014, SEMWEB.

[134]  Panos Kalnis,et al.  Adaptive Partitioning for Very Large RDF Data , 2015, ArXiv.

[135]  M. Tamer Özsu,et al.  Executing queries over schemaless RDF databases , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[136]  Min Cai,et al.  RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network , 2004, WWW '04.

[137]  Bhavani M. Thuraisingham,et al.  Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing , 2011, IEEE Transactions on Knowledge and Data Engineering.

[138]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[139]  Fiona Fui-Hoon Nah,et al.  A study on tolerable waiting time: how long are Web users willing to wait? , 2004, AMCIS.

[140]  Divyakant Agrawal,et al.  Approximate nearest neighbor searching in multimedia databases , 2001, Proceedings 17th International Conference on Data Engineering.

[141]  Martin L. Kersten,et al.  Self-organizing tuple reconstruction in column-stores , 2009, SIGMOD Conference.

[142]  Said Mirza Pahlevi,et al.  RDFCube: A P2P-Based Three-Dimensional Index for Structural Joins on Distributed Triple Stores , 2005, DBISP2P.

[143]  Daniel J. Abadi,et al.  Query optimization of distributed pattern matching , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[144]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[145]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[146]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[147]  Jun Sakuma,et al.  Fast approximate similarity search in extremely high-dimensional data sets , 2005, 21st International Conference on Data Engineering (ICDE'05).

[148]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[149]  Daniel J. Abadi,et al.  Using The Barton Libraries Dataset As An RDF benchmark , 2007 .

[150]  Luping Ding,et al.  Dynamic Materialized Views , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[151]  Anastasia Ailamaki,et al.  H2O: a hands-free adaptive store , 2014, SIGMOD Conference.

[152]  Harumi A. Kuno,et al.  Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores , 2011, Proc. VLDB Endow..

[153]  M. Tamer Özsu,et al.  Workload Matters: Why RDF Databases Need a New Design , 2014, Proc. VLDB Endow..

[154]  Yves Raimond,et al.  RDF 1.1 Primer , 2014 .

[155]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[156]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[157]  Li Ma,et al.  Efficient Indices Using Graph Partitioning in RDF Triple Stores , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[158]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[159]  Gennady Antoshenkov,et al.  Dictionary-based order-preserving string compression , 1997, The VLDB Journal.

[160]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[161]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[162]  François Goasdoué,et al.  CliqueSquare: Flat plans for massively parallel RDF queries , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[163]  Vivek R. Narasayya,et al.  Integrating vertical and horizontal partitioning into automated physical database design , 2004, SIGMOD '04.

[164]  Gang Luo,et al.  Partial Materialized Views , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[165]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[166]  Bhavani M. Thuraisingham,et al.  Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce , 2009, CloudCom.

[167]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[168]  Gang Hu,et al.  SQLGraph: An Efficient Relational-Based Property Graph Store , 2015, SIGMOD Conference.

[169]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[170]  Richard E. Schantz,et al.  High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store , 2010, PSI EtA '10.

[171]  M. Tamer Özsu,et al.  chameleon-db: a Workload-Aware Robust RDF Data Management System , 2013 .

[172]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[173]  Josep-Lluís Larriba-Pey,et al.  The linked data benchmark council: a graph and RDF industry benchmarking effort , 2014, SGMD.

[174]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Join on Uncertain Data , 2006, DASFAA.

[175]  Kevin Wilkinson,et al.  Jena Property Table Implementation , 2006 .

[176]  Stefano Ceri,et al.  Horizontal data partitioning in database design , 1982, SIGMOD '82.

[177]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[178]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[179]  Shai Ben-David,et al.  Towards Property-Based Classification of Clustering Paradigms , 2010, NIPS.

[180]  V. Leitáo,et al.  Computer Graphics: Principles and Practice , 1995 .

[181]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[182]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[183]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[184]  Marcelo Arenas,et al.  A Principled Approach to Bridging the Gap between Graph Data and their Schemas , 2014, Proc. VLDB Endow..

[185]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[186]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.