Discovering Regularities in Databases Using Canonical Decomposition of Binary Relations

Regularities in databases are directly useful for knowledge discovery and data summarization. As a mathematical background, relational algebra helped for discovering the main data structures and existing dependencies between the different attributes in a relational database. Functional, difunctional and other kinds of dependencies in a relational database describe invariant regular structures that have been used intensively for database decomposition, and for minimizing redundancy. In this paper, we explain why “concepts” or “maximal rectangles” should be considered as the atomic regular structure for decomposing a binary relation which can be useful for different applications. More specifically, we have noticed experimentally, that “optimal concepts” contain pertinent information about data that we have exploited positively for machine learning, dynamic and incremental database organization, text summarization, data reduction, and even for modeling human thinking. Operators on concepts need to be developed because of their general usefulness in data and information engineering. In this paper, we propose to work on a canonical decomposition of binary relations based on two operators f and g, to model some important open problems, as for example on how to put in equation the best optimal conceptual coverage of a binary relation. We first develop an algorithm to find a conceptual coverage of a binary relation. We then exploit Riguet’s difunctional relation to put in equation all isolated pairs in a binary relation. Using iteratively these isolated pairs, we find several varieties of efficient solutions for the canonical decomposition problem.