Grid cell size in relation to errors in maps and inventories produced by computerized map processing

Studies are reported which improve the understanding of the process of converting map data from a graphical representation to a computer compatible format. A uniformly shaped and spaced network of cells, a grid, may be used to determine the spatial characteristics of the map. Investigations were made into (1) a technique for characterizing the spatial nature of a map, (2) the effect of cell size and grid position on computer processing to produce inventory tables and new maps, and (3) the potential for modeling spatial cellularization. The frequency distribution of distances between boundary lines enclosing homogeneous map units was employed to characterize spatial characteristics of a map. The accuracies of maps and inventory tables produced b y computer processing of a single map wi th different cell sizes and grid positions were determined. Grid position significantly affects accuracy when one isolated homogeneous map unit is processed; i t is not significant in processing maps containing many such units. The importance decreases as the randomness of shape and size of the mapping units increases. As cell size was allowed to increase, the accuracies of maps and inventories produced by computer processing decreased. Likewise, sample statistics (mean, mode, variance) of the interboundary distance distributions at each cell size were found to decrease systematically with the increases in cell size. A mathematical modeling process was formulated to allow (1) estimation of the interboundary distance distribution of a map before cellularization and (2) prediction of mapping and inventory accuracies which might be achieved with dqferent cell sizes. Two models were derived on the assumption that the quantities involved were one dimensional and were tested in comparison to experimentally observed accuracies. Although both models overestimated the errors at any particular cell size, the predictions were not erratic and the behavior of the models encourages further research into refinement of the models. integrated with other sources of information.' products must be of known and reasonable accuComputer processing is indespensible in storing, racy to be acceptable to any decision process. manipulating, retrieving, and displaying large Many articles in the field of remote sensing disquantities of diverse data. cuss accuracy of products resulting from processPHOTOGRAMMETRIC ENGINEERING A N D REMOTE SENSING, Vol. 48, No. 8, August 1982, pp. 1289-1298. 0099-1 112182/4808-1289$02.25/0 @ 1982 American Society of Photogrammehy PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING, 1982 ing to (1) identify mapping units by interpretation or machine assisted classification or (2) remove geometric distortions caused by the sensor. These articles are concerned with the correctness with which map units are identified and with the map geometry itself. This paper is concerned with computer processing of map data assuming the map is error free. The studies reported are concerned with understanding the effect of dividing maps into a grid of cells to allow computer processing. In order for an information system to be termed geographic and have the capability to generate maps, the data base must be designed to include spatial location informati~n.~ Location identifiers can be included in the data base by one of four techniques: external index, coordinate reference, arbitrary grid, and explicit b ~ u n d a r y . ~ The latter two techniques maintain map boundary information in a form suitable for mapping. They are used in the two most common forms of geographic information systems, known as grid (cell) or line (polygon) systems, respectively. A systematic comparison of the operating costs of cell and polygon system and of product accuracies was made by Smith.5 He found that conversion of map data and typical analyses were eight to ten times more expensive with the polygon system although that system exhibits higher spatial accuracy. The faster, simpler cell systems are generally less e ~ p e n s i v e . ~ . ~ A common criticism of cellular systems is that the gridding of the map for computer compatibility forces some selected grid cell size to be the lower limit on the spatial resolution. This makes cell size selection extremely important in creating a data base. Guidelines in the literature include utilizing the resolution of the source data,7 selecting the smallest cell affordable in the operation of the s y ~ t e m , ~ adjusting the size and shape of the cell to match the capabilities of the output device, e.g., rectangular to offset line printer aspect ratios: and selecting a cell size small enough that the smallest mapping unit will be greater than 50 percent of any ell.^.^ A better understanding of the effect of cellularization on a map would be useful in (a) selecting the cell size for map conversion to computer format, (b) assigning cell size when converting from a polygon format to the cellular format, and (c) deciding on cell size changes during the course of map processing. Cell size has significant effect on map accuracy. Nichols10 reported a brief study of several cell sizes and soil map complexities and concluded that cellularization was too inaccurate. Hord" proposed a statistical model for evaluation of map accuracy, and Van Genderen'' extended the application of the model to the problem of guarding against overconfidence. Note that the Hord-Van Genderen analyses require that the product map be in hand, and T ~ m l i n s o n ' ~ reports that data preparation costs for a cellular system run four to five times the analysis costs. Hence, procedures for iterative digitization-evaluation-digitization are not appropriate to the problem. Switzerg developed a map accuracy evaluation technique as a Boolean overlay of an "estimated map and the "true" map with a two-level resultant map of matching and non-matching categories. Mathematical arguments and approximations allowed him to estimate the map accuracy from the " estimated" map alone. In the course of his analysis, he also justified square cells. His procedure, however, requires that the computer data be created before accuracy can be evaluated. The Hord, Van Genderen, and Switzer procedures are useful in evaluation of product accuracies only in retrospect. The problem of cell size selection requires predictive capability. An estimation of product accuracy before data entry begins is necessary. The performance of a geographic information system may be measured by the accuracy of the products it produces, i.e., maps and tables (assuming that the map data in the data base are error-free). An experiment was conducted to seek a relationship between input map data characteristics and output mapping and tabulation accuracies with various cell sizes (for a detailed discussion consult Wehde14). It should be emphasized that it is the cellularization of maps that is being studied, not a particular cellular information system. The information system employed for the study is described only to document the procedures by which the evaluation of cellularization took place. The Area Resource Analysis System, AREAS, an information system developed at the Remote Sensing Institute, South Dakota State University.I5 AREAS provides the capability to change resolution (cell size), overlay maps, interpret maps, tabulate data sets, plot or record results and analyze data characteristics.I6 A portion of a detailed soil survey map representing a two mile square area was selected as representative of a map with moderate polygon density yet diverse shapes and sizes of map units. To create a data base which could be employed as the "accurate" or "true" map standard, a very small cell size was selected. The cell size was also constrained to be a small subdivision of approximately 1, 4, and 16, ha (2.5, 10, and 40 acre) cells such that the study of increasing cell sizes would include these historically common sizes. A 0.007 ha (0.017 acre) cell met these requirements and GRID CELL SIZE I N RELATION TO ERRORS I N MAPS resulted in a map data base of 384 cells per row in 384 rows. The original map and the base data set are shown in Figure 1. Eleven additional data sets were created by adjusting the cell size. A grouping or aggregation of cells by integral multiples was employed, that is, pairs of cells in pairs of rows combined, groups of three cells in three rows combined, etc. In each succeeding case fewer cells of larger individual area represented the contents of the original map. Only those integral factors which evenly divide the 384 cell by 384 row map were utilized. This eliminated the situations of partial cells being created at the ends of rows or in the last row of the new map data set. The integral factors employed to group cells into new map data sets were 2,3 ,4 ,6 ,8 , 12, 16, 24,32, 48, and 64. In the remaining figures and text these factors are termed "resolution numbers7' or "resolutions" to maintain a context of spatial extent of the cell on the Earth's surface. The resolution numbers cited correspond to 0.028, 0.063, 0.112, 0.252,0.448,1.008, 1.792,4.032,7.168,16.128, and 28.672 ha (0.069, 0.156, 0.278, 0.625, 1.111, 2.500, 4.444, 10.000, 17.778, 40.000, and 71.111 acres). The original, reference map data set and the eleven new map data sets created by the cell aggregation technique are mapped in Figure 2 by a film recording process. The twelve data sets in the data base represent the same map cellularized at twelve different cell sizes. The AREAS information system was utilized to evaluate the accuracy of maps and inventories produced from each of the twelve data sets by the process shown in flow chart form in Figure 3. The process is shown for one resolution number and was repeated a total of eleven times. With the exception of the COMPARE step, all ovals in the flow chart signify an AREAS processing function, i.e., TABULATE, COMPOSITE, AGGREGATE, and INTER-