A Data Mining System for Estimating a Largesize Matrix for the Environmental Accounting

Abstract: This paper presents a data mining system being capable of automatically estimating and updating a large-size matrix for environmental accounting. Envi-ronmental accounting addresses how to correctly measure greenhouse gas emis-sion of an organization. Among the various environmental accountingmethods, the Economic Input-Output Life Cycle Assessment (EIO-LCA) method uses in-formation about industry transactions-purchases of materials by one industry from other industries, and the information about direct environmental emissions of in-dustries, to estimate the total emissions throughout the whole supply chain. The core engine of the EIO-LCA is the input-output model which is in the format of a matrix. This system aims to estimate the large-size input-output model and con-sists of a series of components with the purposes of data retrieval, data integration, data mining, and model presentation. This unique system is able to interpret and follow users’ XMLbased scripts, retrieve data from various sources and integrate -them for the following data mining components. The data mining component is based on a unique mining algorithm which constructs the matrix from the histori-cal data and the local data simultaneously. This unique data mining algorithm runs over the parallel computer to enable the system to estimate a matrix of the size up to 3700-by-3700. The result demonstrates the acceptable accuracy by comparing a part of the multipliers with the multipliers calculated by the matrix constructed by the surveys. The accuracy of the estimation directly impacts the quality of envi-ronmental accounting.