Similarity-based queries

We develop a domain-independent framework for defining qneries in terms of similarity of objects. Our framework has three components: a pattern language, a transformation rule language, and a query language. The pattern language specifies classes of objects, the transformation rule language defines similarity by specifying the similarity-preserving transformations, and the whole package is wrapped in a general query language. The framework can be “tuned” to the needs of a specific application domain, such as time sequences, molecules, text strings or images, by the choice of these languages. We demonstrate the framework by presenting a specific instance on a specific domain – the domain of sequences. We start with sequences over a finite alphabet, and then consider sequences over infinite ordered domains. The basic pattern language weuseis regular expressions, and the query language is calculus-based. We show that even when the pattern/query languages chosen are not too powerful, the approximation framework obtained is very strong. We study the properties of the framework, and in particular present expressive power and complexity results.