Abstraction of High Level Concepts from Numerical Values in Databases

A conceptual clustering method is proposed for discovering high level concepts of numerical attribute values from databases. The method considers both frequency and value distributions of data, thus is able to discover relevant concepts from numerical attributes. The discovered knowledge can be used for representing data semantically and for providing approximate answers when exact ones are not available. Our knowledge discovery approach is to partition the data set of one or more attributes into clusters that minimize the relaxation error. An algorithm is developed which finds the best binary partition in O(n) time and generates a concept hierarchy in O(n2) time where n is the number of distinct values of the attribute. The effectiveness of our clustering method is demonstrated by applying it to a large transportation database for approximate query answering.