Feature extraction and indexing techniques for pictorial database retrieval

How to achieve efficient curvilinear object retrieval is one of the key problems in content-based pictorial database retrieval. We present a general methodology based on the idea of separating the task into several hierarchical levels. Fast candidate screening procedures at higher levels on more abstract representations serve as filters to reduce the searching space for the lower level, more expensive matching operations. The top level of such candidate screening procedures is an indexing procedure which operates on index vectors composed of numbers of high-level features of various types. The multi-level candidate screening process proceeds along hierarchical representations of curvilinear objects based on high-level shape features. A set of methods for curve description, curvilinear feature recovery and high-level shape feature extraction have been developed to facilitate the bottom-up construction of such representations. The high-level features to be extracted and used to construct the hierarchical representations depend on each particular application. General guidelines for feature selection, derived from information theory, are provided. The implementation of the general methodology in two distinct applications, road network matching and cursive handwriting recognition, showed encouraging results. Particularly, classification using index vectors composed of numbers of high-level features of various types proves to be a very promising technique for efficient and effective candidate screening.