Design of Statistical Information Media: Time Performance and Storage Constraints

A statistical database can be seen as a set of tables and a set of derivation functions; each function maps a set of tables into a new one. In order to optimize the time performance of the system it would be convenient to store the derived tables in secondary memory. However, a trade-off between storage resources and time performance arises: when the storage space is constrained, it is necessary to choose which derived tables have to be stored and which of them have to be computed on-line. In this paper such trade-off problem is investigated. We formulate it as an integer linear program, both for monadic and for polyadic derivation functions. In the first case we obtain a Simple Plant Location problem with a linear Knapsack constraint; in the second case the obtained program is equivalent to a Simple Plant Location problem with a submodular Knapsack constraint. Moreover we show that the problem is NP-complete and propose an efficient heuristic approach to solve it.