Automatic cataloguing and characterization of Earth science data using SE-trees

In the future, NASA's Earth Observing System (EOS) platforms will produce enormous amounts of remote sensing image data that will be stored in the EOS Data Information System. For the past several years, the Intelligent Data Management group at Goddard's Information Science and Technology Office has been researching techniques for automatically cataloguing and characterizing image data (ADCC) from EOS into a distributed database. At the core of the approach, scientists will be able to retrieve data based upon the contents of the imagery. The ability to automatically classify imagery is key to the success of contents-based search. We report results from experiments applying a novel machine learning framework, based on Set-Enumeration (SE) trees, to the ADCC domain. We experiment with two images: one taken from the Blackhills region in South Dakota; and the other from the Washington DC area. In a classical machine learning experimentation approach, an image's pixels are randomly partitioned into training (i.e. including ground truth or survey data) and testing sets. The prediction model is built using the pixels in the training set, and its performance is estimated using the testing set. With the first Blackhills image, we perform various experiments achieving an accuracy level of 83.2 percent, compared to 72.7 percent using a Back Propagation Neural Network (BPNN) and 65.3 percent using a Gaussain Maximum Likelihood Classifier (GMLC). However, with the Washington DC image, we were only able to achieve 71.4 percent, compared with 67.7 percent reported for the BPNN model and 62.3 percent for the GMLC.