Query optimization by semantic reasoning

The problem of database query optimization is to select an efficient way to process a query expressed in logical terms from among the alternative ways it can be carried out in the physical database. This thesis presents a new approach to this problem, called semantic query optimization. The goal of semantic query optimization is to produce a semantically equivalent query that is less expensive to process than the original query. Semantic query optimization actually transforms the original query into a new one by means of a process of inference. The transformations are limited to those that yield a semantically equivalent query, one that is guaranteed to produce the same answer as the original query in any permitted state of the database. This guarantee is achieved because the knowledge used to transform a query is the same knowledge used to insure the semantic integrity of the data stored in the database. Thus, semantic query optimization brings together the apparently separate research areas of query processing the database integrity. The thesis also addresses an important issue in current automatic planning research: production not just of a correct solution but of a "good" one, by means of an efficient problem solver. Semantic query optimization advances the notion of a problem reformulation step for problem-solving programs. In this step, equivalent statements of the original problem are sought, one of which may have a better solution than the original problem. This method avoids explicit and possibly costly analysis of efficiency factors during planning itself. Semantic query optimization can also be viewed as one aspect of intelligent database mediation. It applies knowledge of a problem domain and of the capabilities and limitations of the database to pose the most effective and easily processed queries to solve a user's problem. The thesis formally defines transformations that preserve semantic equivalence for queries in the relational calculus. In addition, it identifies several classes of cost-reducing query transformations for relational database queries, and provides quantitative estimates of the improvements they can produce, based upon widely accepted models of query processing. The thesis also discusses the design and implementation of a system that carries out semantic query optimization for an important class of relational database queries. The system is called QUIST, standing for QUery Improvement through Semantic Transformation. The QUIST system has analyzed a range of queries for which different transformations apply. For these queries, QUIST obtains substantial reductions in the cost of processing at a negligible cost for the analysis itself.

[1]  Ronald J. Brachman,et al.  Special issue on knowledge representation , 1980, SGAR.

[2]  Mark Jeffrey Stefik Planning with constraints , 1980 .

[3]  Richard Fikes,et al.  Deductive Retrieval Mechanisms for State Description Models , 1975, IJCAI.

[4]  Gary G. Hendrix,et al.  Developing a natural language interface to complex data , 1977, TODS.

[5]  Donald D. Chamberlin,et al.  Implementation of a structured English query language , 1975, CACM.

[6]  D. McLeod HIGH LEVEL EXPRESSION OF SEMANTIC INTEGRITY SPECIFICATIONS IN A RELATIONAL DATA BASE SYSTEM , 1976 .

[7]  Bruce G. Buchanan,et al.  On generality and problem solving: a case study using the DENDRAL program , 1970 .

[8]  Michael Stonebraker,et al.  The design and implementation of INGRES , 1976, TODS.

[9]  R. Jeffrey Davis,et al.  Applications of meta level knowledge to the construction, maintainance and use of large knowledge b , 1976 .

[10]  Michael Lawrence Brodie Specification and verification of data base semantic integrity. , 1978 .

[11]  David W. Shipman The functional data model and the data language DAPLEX , 1979, SIGMOD '79.

[12]  Robert Fletcher Sproull,et al.  Strategy construction using a synthesis of heuristic and decision-theoretic methods , 1977 .

[13]  Jürgen M. Janas How Not to Say "NIL": Improving Answers to Failing Queries in Data Base Systems , 1979, IJCAI.

[14]  Chin-Liang Chang DEDUCE 2: Further Investigations of Deduction in Relational Data Bases , 1977, Logic and Data Bases.

[15]  Randall L. Frank,et al.  CODASYL Data-Base Management Systems , 1976, CSUR.

[16]  Ramez Elmasri,et al.  Properties of relationships and their representation , 1899, AFIPS '80.

[17]  Michael L. Brodie,et al.  Relational Database Systems , 1983, Springer Berlin Heidelberg.

[18]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[19]  Elaine Kant,et al.  Efficiency considerations in program synthesis : a knowledge-based approach , 1979 .

[20]  Dennis McLeod,et al.  Semantic integrity in a relational data base system , 1975, VLDB '75.

[21]  Irving L. Traiger,et al.  A history and evaluation of System R , 1981, CACM.

[22]  E. F. Codd,et al.  Relational Completeness of Data Base Sublanguages , 1972, Research Report / RJ / IBM / San Jose, California.

[23]  Ramez Elmasri,et al.  On the design, use, and integration of data models , 1980 .

[24]  O Türkyilmaz Data base systems. , 1978, Archives of physical medicine and rehabilitation.

[25]  Michael L. Brodie Data Abstraction, Databases and Conceptual Modelling , 1980, VLDB.

[26]  Peter P. Chen The entity-relationship model: toward a unified view of data , 1975, VLDB '75.

[27]  David R. Barstow,et al.  Knowledge-based program construction , 1979 .

[28]  Richard W. Weyhrauch,et al.  Prolegomena to a Theory of Mechanized Formal Reasoning , 1980, Artif. Intell..

[29]  Samuel Jerrold Kaplan,et al.  Cooperative responses from a portable natural language data base query system. , 1979 .

[30]  Daniel Sagalowicz IDA: An Intelligent Data Access Program , 1977, VLDB.

[31]  Donald D. Chamberlin,et al.  Functional specifications of a subsystem for data base integrity , 1975, VLDB '75.

[32]  Douglas B. Lenat,et al.  AM, an artificial intelligence approach to discovery in mathematics as heuristic search , 1976 .

[33]  David De Jong,et al.  Evaluation of Access Paths in a Relational Database System , 1978 .

[34]  Randall Davis,et al.  Interactive Transfer of Expertise: Acquisition of New Inference Rules , 1993, IJCAI.

[35]  C. Robert Carlson,et al.  A generalized access path model and its application to a relational data base system , 1976, SIGMOD '76.

[36]  Gerald A. Wilson A Conceptual Model for Semantic Integrity Checking , 1980, VLDB.

[37]  Robert C. Moore Handling Complex Queries in a Distributed Data Base , 1979 .

[38]  Michael Stonebraker,et al.  INGRES: a relational data base system , 1975, AFIPS '75.

[39]  John Mylopoulos,et al.  Two views of data semantics: a survey of data models in artificial intelligence and data management , 1977 .

[40]  Dennis McLeod,et al.  Abstraction in databases , 1980, Workshop on Data Abstraction, Databases and Conceptual Modelling.

[41]  Michael Stonebraker,et al.  Distributed query processing in a relational data base system , 1978, SIGMOD Conference.

[42]  Eugene Wong,et al.  Decomposition—a strategy for query processing , 1976, TODS.

[43]  Jack Minker,et al.  Logic and Data Bases , 1978, Springer US.

[44]  Jonathan J. King Exploring the use of domain knowledge for query processing efficiency , 1979 .

[45]  Hans Albrecht Schmid,et al.  On the semantics of the relational data model , 1975, SIGMOD '75.

[46]  Jean-Marie Nicolas,et al.  Data Base: Theory vs. Interpretation , 1977, Logic and Data Bases.

[47]  Alain Pirotte,et al.  High Level Data Base Query Languages , 1977, Logic and Data Bases.

[48]  Dennis McLeod,et al.  A Semantic Data Base Model and Its Associated Structured User Interface. , 1978 .

[49]  William Kent,et al.  Data and Reality , 1978 .

[50]  Jack Minker,et al.  The Use of a Semantic Network in a Deductive Question-Answering System , 1977, IJCAI.

[51]  Michael Hammer,et al.  Efficient monitoring of database assertions , 1978, SIGMOD '78.

[52]  Michael Stonebraker,et al.  Implementation of integrity constraints and views by query modification , 1975, SIGMOD '75.