Mathematical programming within the context of a generalized data base management system

— Aspects of mathematica! programming are examined within the context oj a generalized data base management and query system. This system is gênerai in the sense of its ability to support applications other than mathematical programming and its independence from the actual types of data values available. lt is shown how data for mathematical programming may be organized into a network data structure which may be interrogated via non-procedural, English-like guéries. Three methods are presented for interfacing math programming algorithms with this data base. Enhanced data manipulation facilities, particular to matrices and Systems of équations, are also introduced. Finally a method is shown whereby programs may be integrated into a data structure, enhancing a useras ability to build alternative modelsfor data analysis. INTRODUCTION Data base management is a relatively new field which is currently the object of intense investigation. It involves the organization of data into some structure and the fitting of data with models and models with data in order to provide needed analyses. This présentation explores ways in which tools in the field of data management can offer assistance in the solution of mathematical programming problems. Designers of specialized math programming Systems will observe a correspondence between some of the data base notions presented hère and the ideas used in various math programming-related data facilities. The view adopted hère pictures mathematical programming as a problem of data management, where the data relates to constraints and objectives. The models include linear, interger and non-linear application routines. As such we outline a fundamentally new perspective for viewing mathematical programming problems. It must be emphasized that we are not hère concerned with mathematical programming algorithms per se, but with an effective tool for implementation and utilization of such algorithms, regardless of their special methods. Moreover, we suggest that development (•) Manuscrit reçu août 1977. (*) Research supported in part by National Science Foundation, Division of Computing Research, Grant Number MCS 76-24675. () Assistant Professor of Management and Computer Sciences, Purdue University. () Visiting Assistant Professor of Management, Purdue University. C) Professor of Management and Computer Sciences, Purdue University. R.A.I.R.O. Recherche opérationnelle/Opérations Research, vol. 12, n° 2, mai 1978 118 R. BONCZEK, C. HOLSAPPLE, A. WHINSTON and implementation of math programming algorithms may very well profit from future advancements in the data base management field. This would permit development and implementation efforts to be concentrated on theoretical aspects and numerical techniques, removing the often onerous task of data management (e. g. over-laying, manipulation of specialized storage structures, etc.). Within the context of data base management we introducé three basic methods for effecting the interface between data and mathematical programming routines. The first method consists of extracting appropriate data values from a data base and building them into a file that can be input to the desired application routine. In the second method, application programs are devised such that they utilize cominands which enable direct access to the data base. The third method incorporâtes programs into the data base itself such that they may be executed by submission of non-procedural, English-like queries. The spécifies of these three methods are outlined within the framework of GPLAN (Generalized Planning System) which is under continuing development at Purdue University. The outstanding features of this data base management system may be summarized as follows : utilization of a network data base, sélective retrieval of any configuration of data from a given network structure, and user interface with a data base and application routines via a non-procedural, English-like query language. As indicated in the ensuing discussion, GPLAN's extensive data management capabilities also provide a convenient tool for the évaluation of parametric changes and various modifications in problems formulation. Moreover it enables storage, retrieval and manipulation of not only objective and constraint coefficients, but also descriptive information about each coefficient such as its source and currency. During the formulation of large scale problems such information is vital for purposes of resolving conflicting constraints and rectifying the variety of errors and inconsistencies which almost inevitably occur. The GPLAN system provides a single, gênerai mechanism for handling specially structured matrices. Finally, this system allows the data base to be used by other applications (e. g., simulations, statistical packages, etc.). A cursory overview of the GPLAN method of data management is the necessary precursor of a detailed examination of its applicability to problems of mathematical programming. THE GENERALIZED PLANNING SYSTEM GPLAN [1, 2] has two primary constituents: a data management system (GPLAN/DMS) and a query system (GPLAN/QS). The former enables a user to access a network data base with a procédural, programming language. R.A.I.R.O. Recherche opérationnelle/Opérations Research MATHEMATTCAL PROGRAMMING WITH A DATA BASE SYSTEM 119 The latter allows retrieval of data and the exécution of large application routines as a resuit of posing non-procedural, Engüsh-like queries ; this feature permits data base utilization by non-programmers. Within the scope of this présentation, a data base is considered tp be defined by two attributes : a schema and a collection of data values which are logically organized in conformity with that schema. A schema is the spécification of a logical structure; it is a blueprint of data base contents. Notice that we do not consider physical storage structures hère, since ail user requests of the data base are made in terms of its logical organization. The fundamental building blocks of a schema are data item types; for example, VARJABLE-ID, VARIABLE-DESCRIPTION, CONSTRAINT-ID, CONSTRAINT-DESCRIPTION, COEFFICIENT-VALUE, COEFFICIENT-SOURCE refer to types of data that we may désire to include in a data base. Each of these data item types represents many occurrences of data values of that type within the data base; the data item type VARIABLE-ID may have "XI" through "X100" as data value occurrences. The schema also spécifies the nature of the relationships that each data item type has with other data item types. There are two vaneties of relationships among data item types: aggregation and association. Data item types may be aggregated into what are termed record types; for instance VARIABLE-ID and VARIABLE-DESCRIPTION mây be aggregated to form the record type VARIABLE. This is illustrated in figure 1 a, where the record type is indicated by the rectangle labeled VARIABLE. A sample record occurrence of VARIABLE is "XI" and "AMOUNT OF RESOURCE I TO BE USED". Alternatively, record types (and therefore data item types) may be associated with each other by means of a set relation, as outlined in the CODASYL Data Base Task Group (DBTG) Report of 1971 [3]. The DBTG "set" concept should not be confused with the mathemàtical notion of a "set", for they are not related. A set is defined in terms of an owner record type and a member record type such that there is a one-to-many relationship between owner and member occurrences. That is, there may be many occurrences of the member record type associated with each occurrence of the owner record type; but for a particular set, a given member occurrence may be associated with no more than one occurrence of the owner record type. Consider the record types VARIABLE and COEFFICIENT, the latter being an aggregation of such data item types as COEFFICIENT-VALUE and COEFFICIENT-SOURCE. If we define the set HAS with VARIABLE as its owner record type and COEFFICIENT as its member record type, then we have indicated that there may be many coefficients associated with each variable; but a given coefficient cannot vol. 12, n° 2, mai 1978 120 R. BONCZEK, C. HOLSAPPLE, A. WHINSTON be associated with more than one variable. Pictorially a set is indicated by an arrow that points from the owner record type to the member record type (see fig. 1 b). Not only does a set furnish information about the relation among occurrences of owner and member record types, but it also permits the member