Mining census data for spatial effects on mortality

The paper describes a system for spatial data mining illustrating its features by an application to spatial census data. Using census data for data mining includes specific challenges. Because of data privacy regulations, census data are generally available for analysis only in aggregated form. Primary data (responses of persons) are aggregated in many cross tabulations for small geographical units. Thus the target objects of secondary analysis are small areas (enumeration districts or wards). Any cell or marginal of a cross tabulation can be used as variable on these target objects. The target objects can be linked with other spatial objects (e.g. rivers, roads, railway lines) for spatial analyses. In this paper we discuss the special problems that occur for this type of aggregate data mining including spatial analyses. We show an application of SubgroupMiner, which is an advanced subgroup mining system supporting multirelational hypotheses, efficient data base integration, discovery of causal subgroup structures, and visualization based interaction options. The application explores if transportation lines (e.g. roads, railway lines) increase mortality for those persons that live near such objects because of a possible higher occurrence of some disease.