A set-based approach to deal with hierarchical structures

Hierarchical structures are pervasive in computer science because they are a fundamental means for modeling many aspects of reality and for representing and managing a wide corpus of data and digital resources. One of the most important hierarchical structures is the tree, which has been widely studied, analyzed and adopted in several contexts and scientific fields over time. Our work takes into major consideration the role and impact of the tree in computer science and investigates its applications starting from the following pivotal question: "Is the tree always the most advantageous choice for modeling, representing and managing hierarchies?" Our aim is to analyze the nature and use of hierarchical structures and determine the most suitable way of employing them in different contexts of interests. We concentrate our work mainly on the scientific field of Digital Libraries. Digital Libraries are the compound and complex systems which manage digital resources from our cultural heritage – belonging to different cultural organizations such as libraries, archives and museums – and which provide advanced services over these digital resources. In particular, we point out a focal use case within this scientific field based on the modeling, representation, management and exchange of archival resources in a distributed environment. We take into consideration the hierarchical inner structure of archives by considering the solutions proposed in the literature for modeling, representing, managing and sharing the archival resources. Archives are usually modeled by means of a tree structure; furthermore, the standard de facto for digital encoding of digital cultural resources – described and represented by means of metadata – is the eXtensible Markup Language (XML) that supports a tree representation. The problem often affecting this approach is that the model used to represent the hierarchies is bounded by the specific technology of choice adopted for its instantiation – e.g. the XML. In the archival context the tree structure is commonly instantiated by means of a unique XML file which mixes up the hierarchical structure elements with the content elements, without a clear distinction between the two; it is then not straightforward to determine how to access and exchange a specific subset of data without navigating the whole hierarchy or without losing meaningful hierarchical relationships. To address the problems exemplified in the previous scenario we propose the NEsted SeT for Object hieRarchies (NESTOR) Framework which is composed of two main components: the NESTOR Model and the NESTOR Prototype. The NESTOR Model is the core of the NESTOR Framework because it defines the set data models on which every component of the framework relies. It defines two set data models that we have called the "Nested Set Model (NS-M)" and the "Inverse Nested Set Model (INS-M)". We formally define these two set data models by showing how we can model and represent hierarchies throughout collections of nested sets. We show how these models add some features with respect to the tree while maintaining its full expressive power. We formally prove several properties of these models and show the correspondences with the tree. Furthermore, we define four distance measures for the the NS-M and the INS-M and we prove them to be metric spaces. The NESTOR Model is presented from a formal point-of-view and then envisioned in a practical application context defined by the NESTOR Prototype. In order to describe the prototype we rely on the archive use case, and propose an application for modeling, representing, managing and sharing of archival resources. The expressive power of the archive modeled by means of a tree and the set data models are compared. We analyze the advantages and disadvantages of our approach when data management and exchange in distributed environments have to be faced. We provide a concrete implementation of the described models in the context of the informative system called SIAR (Sistema Informativo Archivistico Regionale) that we designed and developed for the management of the archival resources of the Italian Veneto Region. Furthermore, we show how the NESTOR Framework can be used in conjunction with well-established and widely-used Digital Libraries technological advances.