Clustering based on eigenspace transformation – CBEST for efficient classification

Abstract Large remote sensing datasets, that either cover large areas or have high spatial resolution, are often a burden of information mining for scientific studies. Here, we present an approach that conducts clustering after gray-level vector reduction. In this manner, the speed of clustering can be considerably improved. The approach features applying eigenspace transformation to the dataset followed by compressing the data in the eigenspace and storing them in coded matrices and vectors. The clustering process takes the advantage of the reduced size of the compressed data and thus reduces computational complexity. We name this approach Clustering Based on Eigen-space Transformation (CBEST). In our experiment with a subscene of Landsat Thematic Mapper (TM) imagery, CBEST was found to be able to improve speed considerably over conventional K-means as the volume of data to be clustered increases. We assessed information loss and several other factors. In addition, we evaluated the effectiveness of CBEST in mapping land cover/use with the same image that was acquired over Guangzhou City, South China and an AVIRIS hyperspectral image over Cappocanoe County, Indiana. Using reference data we assessed the accuracies for both CBEST and conventional K-means and we found that the CBEST was not negatively affected by information loss during compression in practice. We discussed potential applications of the fast clustering algorithm in dealing with large datasets in remote sensing studies.

[1]  Maoguo Gong,et al.  Natural and Remote Sensing Image Segmentation Using Memetic Computing , 2010, IEEE Computational Intelligence Magazine.

[2]  S. Klein,et al.  Cluster analysis of cloud regimes and characteristic dynamics of midlatitude synoptic systems in observations and a model , 2005 .

[3]  Pablo J. Zarco-Tejada,et al.  Temporal and Spatial Relationships between within-field Yield variability in Cotton and High-Spatial Hyperspectral Remote Sensing Imagery , 2005 .

[4]  Rong Zhang,et al.  A large scale clustering scheme for kernel K-Means , 2002, Object recognition supported by user interaction for service robots.

[5]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[6]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[7]  D. Peddle,et al.  An Integrated Decision Tree Approach (IDTA) to Mapping Landcover Using Satellite Remote Sensing in Support of Grizzly Bear Habitat Analysis in the Alberta Yellowhead Ecosystem , 2001 .

[8]  John R. Jensen,et al.  Introductory Digital Image Processing: A Remote Sensing Perspective , 1986 .

[9]  David G. Long,et al.  An iterative approach to multisensor sea ice classification , 2000, IEEE Trans. Geosci. Remote. Sens..

[10]  Peng Gong,et al.  An assessment of some factors influencing multispectral land-cover classification , 1990 .

[11]  C. Woodcock,et al.  Classification and Change Detection Using Landsat TM Data: When and How to Correct Atmospheric Effects? , 2001 .

[12]  Sanjay Ranka,et al.  An Efficient Space-Partitioning Based Algorithm for the K-Means Clustering , 1999, PAKDD.

[13]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[14]  Zhiliang Zhu,et al.  US forest types and predicted percent forest cover from AVHRR data , 1994 .

[15]  Peng Gong,et al.  Remote sensing of environmental change over China: A review , 2012 .

[16]  G. Asner,et al.  Spatial and temporal probabilities of obtaining cloud‐free Landsat images over the Brazilian tropical savanna , 2007 .

[17]  J. Wickham,et al.  Completion of the 2001 National Land Cover Database for the conterminous United States , 2007 .

[18]  Limin Yang,et al.  Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data , 2000 .

[19]  Sanjay Ranka,et al.  An effic ient k-means clustering algorithm , 1997 .

[20]  Y. Ouma,et al.  On the optimization and selection of wavelet texture for feature extraction from high‐resolution satellite imagery with application towards urban‐tree delineation , 2006 .

[21]  Donald A. Walker,et al.  Landsat MSS-derived land-cover map of northern Alaska: Extrapolation methods and a comparison with photo-interpreted and AVHRR-derived maps , 1999 .

[22]  Turgay Çelik,et al.  Unsupervised Change Detection in Satellite Images Using Principal Component Analysis and $k$-Means Clustering , 2009, IEEE Geoscience and Remote Sensing Letters.

[23]  Christopher A. Barnes,et al.  Completion of the 2006 National Land Cover Database for the conterminous United States. , 2011 .

[24]  Pramod K. Varshney,et al.  Unsupervised classification of hyperspectral data: an ICA mixture model based approach , 2004 .

[25]  James Theiler,et al.  Contiguity-enhanced k-means clustering algorithm for unsupervised multispectral image segmentation , 1997, Optics & Photonics.

[26]  James Theiler,et al.  Clustering to improve matched filter detection of weak gas plumes in hyperspectral thermal imagery , 2001, IEEE Trans. Geosci. Remote. Sens..

[27]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[28]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[29]  George A. Lampropoulos,et al.  Fusion of hyperspectral data using segmented PCT for color representation and classification , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[30]  Hans-Peter Kriegel,et al.  A Database Interface for Clustering in Large Spatial Databases , 1995, KDD.

[31]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[32]  Hankui K. Zhang,et al.  Finer resolution observation and monitoring of global land cover: first mapping results with Landsat TM and ETM+ data , 2013 .

[33]  Kyung-Soo Han,et al.  A land cover classification product over France at 1 km resolution using SPOT4/VEGETATION data , 2004 .

[34]  Annette Otte,et al.  Identifying patterns of land-cover change and their physical attributes in a marginal European landscape , 2007 .

[35]  John A. Richards,et al.  Remote Sensing Digital Image Analysis: An Introduction , 1999 .

[36]  Christian Sohler,et al.  A fast k-means implementation using coresets , 2006, SCG '06.

[37]  S. Silvestri,et al.  Mapping salt-marsh vegetation by multispectral and hyperspectral remote sensing , 2006 .

[38]  N. Loneragan,et al.  Mapping and characterising subtropical estuarine landscapes using aerial photography and GIS for potential application in wildlife conservation and management , 2005 .

[39]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[40]  I. Jolliffe Principal Component Analysis , 2002 .

[41]  Qiming Zhou,et al.  Automated rangeland vegetation cover and density estimation using ground digital images and a spectral-contextual classifier , 2001 .

[42]  Pramod K. Varshney,et al.  Enhanced ICA Mixture Model for Unsupervised Classification , 2004, IBERAMIA.

[43]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[44]  Shengrui Wang,et al.  Image classification algorithm based on the RBF neural network and K-means , 1998 .

[45]  Christelle Vancutsem,et al.  GlobCover: ESA service for global land cover from MERIS , 2007, 2007 IEEE International Geoscience and Remote Sensing Symposium.

[46]  A. Belward,et al.  GLC2000: a new approach to global land cover mapping from Earth observation data , 2005 .

[47]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[48]  W. Dennison,et al.  Spatial distribution of benthic microalgae on coral reefs determined by remote sensing , 2002, Coral Reefs.

[49]  Michael A. Wulder,et al.  Sensitivity of hyperclustering and labelling land cover classes to Landsat image acquisition date , 2004 .

[50]  V. Judson Harward,et al.  Mapping forest vegetation using landsat TM imagery and a canopy reflectance model , 1994 .

[51]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[52]  Lin Zhang,et al.  Automated Detection of Chemical Vapors by Pattern Recognition Analysis of Passive Multispectral Infrared Remote Sensing Imaging Data , 2002 .

[53]  Liangpei Zhang,et al.  An unsupervised artificial immune classifier for multi/hyperspectral remote sensing imagery , 2006, IEEE Trans. Geosci. Remote. Sens..

[54]  T. Wong,et al.  Statistical Analyses of Satellite Cloud Object Data from CERES. Part V: Relationships between Physical Properties of Marine Boundary Layer Clouds , 2008 .

[55]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[56]  Nicolas Viovy,et al.  Automatic Classification of Time Series (ACTS): A new clustering method for remote sensing time series , 2000 .

[57]  Jesslyn F. Brown,et al.  Development of a land-cover characteristics database for the conterminous U.S. , 1991 .

[58]  P. Gong,et al.  Frequency-based contextual classification and gray-level vector reduction for land-use identification , 1992 .

[59]  Neal R. Harvey,et al.  Evolving land cover classification algorithms for multispectral and multitemporal imagery , 2002, SPIE Optics + Photonics.

[60]  R. D. Ramsey,et al.  Landscape Cover-Type Modeling Using a Multi-Scene Thematic Mapper Mosaic , 1997 .

[61]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[62]  G. Woodwell,et al.  Map of the vegetation of South America based on satellite imagery , 1994 .

[63]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  T. M. Lillesand,et al.  Remote Sensing and Image Interpretation , 1980 .