On Line Processing of Compacted Relations

Most data base machines use some kind of "filter" that performs unary relational operators (selection and projection) on relations Cl to 71. These filters operate "on the fly" that is, at the speed of the disk, while the relation is being transferred into main memory, Processing time being proportional to relation size, it is therefore important to represent data in the most compacted way. In this paper we address the problem of satisfying the two seemingly contradictory requirements : i) finding an "optimal" compaction scheme ii) processing optimally compacted relations on the fly. INTRODUCTION Most database machines (DBM) use some kind of filter that performs unary relational operators (selection and projection) on relations (see for example Cl to 7]).These filters operate "on the fly", that is at the speed of the disk, while the relation is being transferred into main memory. Processing time being proportional to relation size, it is important to represent data in the most compacted way. Most DBM just process standard uncompacted data [2.3,4,6.7]. We are currently realizing a machine Cl1 that uses such a filter to process compacted relations. In this paper we address the problem of satisfying the two seemingly contradictory requirements : i> finding an "optimal" compaction scheme ii) processing optimally compacted relations on the fly. Section 1 addresses the problem of compacting relations. Compaction formats are defined for files representing a given relation. The notion of maximally compacted file is then introduced. To obtain an "optimally" compacted file, the method suggested in this section is to choose an adequate set of hierarchical dependencies and to compact the file according to that set. We then turn to the problem of processing such compacted files (section 2). Proceedings of the Eighth International Conference on Very Large Data Bases One reasonable way of filtering compacted files is the Finite State Automaton (FSA) approach. In [I] we concentrated on the problem of realizing such a filtering mechanism and raised the following question : given a selection projection operation and given a file compacted according to some format, can we always find a FSA that performs the operations on the fly on this file. The answer was no and a restrictive class of compacted files was exhibited on which any selection projection operation can be performed on the fly. In this paper we give a complete characterization of operations that can be performed on the fly. I. COMPACTING RELATIONS We assume the reader familiar with relational terminology. A relation R is defined over a set of attributesu; with each A E Ll is associated a domain D(A), we denote D = U D(A). Relations are represented btctequential files. Attribute values in these files are represented byanattribute tag (that indicates the attribute name) followed by the attribute value and ended by an end tag. A file over U is a string over D+. For instance if U = {Course,Student,Grade) then F = MathJones A Math Susan B Latin Mike D, is a fi?e over U. Definition 1.1. A compaction format over U is defined recursively as follows : 1) A and A+ are compaction formats over A 2) if *is a compaction format over X so is (e; 3) if\el and se2 are compaction formats over Xl andX2 andX1 n X = 2 @ then $.e2 is a compaction format over Xl u X2. Such a definition in fact yields a special subset of regular expressions. Exam les of compaction formats over ABC are (A(BC ) > , (ABC)+ or (AB+C+): +P+ The language g(e) associated with (eis defined by : 1) &A) = D(A) VAeU 2) zq.Yz2) = 9te,,. zte2, 3) -hz+) = G&e>>+ 263 Mexico City, September, 1982 Of course sentences from these languages are files over U. We shall say that file F satisfies compaction format eif : E.&w) For instance Math Jones A ?lath Susan B Latin Mike D satisfies (Course Student Grade)+ while Math Jones A Susan.B Latin Mike D satisfies (Course (Student Grade)+)+. These examples should give an intuitive feeling of the meaning of compaction formats : the last file consists of courses followed by sequences of student, grade couples. We shall find it practical to associate with a compaction format its “syntax tree". For instance the syntax tree of (A B+ C+)+ is