Fast Attribute-based Unsupervised and Supervised Table Clustering using P-Trees

Since the advent of digital image technology and remote sensing imagery (RSI), massive amount of image data has been collected worldwide. For example, since 1972, NASA and U.S. Geological Survey through the Landsat Data Continuity Mission, has been capturing images of Earth down to 15 meters resolution. Since image clustering is time-consuming, much of this data is archived even before analysis. In this paper, we propose a novel and extremely fast algorithm called FAUST P or Fast Attribute-based Unsupervised and Supervised Table Clustering for images. Our algorithm is based on Predicate-Trees which are compressed, lossless and data-mining-ready data structures. Without compromising much on the accuracy, our algorithm is fast and can be effectively used in high-speed image data analysis.