A Semi-Supervised Algorithm for Auto-Annotation and Unknown Structures Discovery in Satellite Image Databases

The increasing number and resolution of earth observation (EO) imaging sensors has had a significant impact on both the acquired image data volume and the information content in images. There is consequently a strong need for highly efficient search tools for EO image databases and for search methods to automatically identify and recognize structures within EO images. Content Based Image Retrieval (CBIR) and automatic image annotation systems have been designed to tackle the problem of image retrieval in large image databases. These two systems achieve a common goal, that is to learn the mapping function between low-level visual features and high-level image semantics. A setup, which has hardly been explored in annotating systems and which is the rule rather than the exception, is the case when the training database used to learn the mapping function is not exhaustive regarding semantic classes present in the images. This means that there exists unknown image classes for which there is no training examples in the training database. In this paper, we propose a semi-supervised method for auto-annotating satellite image databases and discovering unknown semantic image classes in these databases. The idea is to incorporate into the learning process the unannotated data which by definition contain the unknown image classes. The latter are considered to be latent structures in the data that appear when we train a hierarchical latent variable model with both the labeled and unlabeled data. We also show that, in our case, the use of unlabeled data leads to more reliable estimates regarding the model parameters. We present experimental results on a synthetic dataset, making a comparison of our algorithm with a semi-supervised Support Vector Machine (S3VM) on this dataset. We also demonstrate the effectiveness of our unknown image classes discovery procedure on a database of SPOT5 satellite images. We show that the results obtained on this database are rather positive since the new structures detected correspond to semantic classes which are not represented in the training database.

[1]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .

[2]  Jianping Fan,et al.  Mining Multilevel Image Semantics via Hierarchical Classification , 2008, IEEE Transactions on Multimedia.

[3]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[4]  Aidong Zhang,et al.  SemQuery: Semantic Clustering and Querying on Heterogeneous Features for Visual Data , 2002, IEEE Trans. Knowl. Data Eng..

[5]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Duncan Fyfe Gillies,et al.  Small Sample Problem in Bayes Plug-in Classifier for Image Recognition , 2001 .

[8]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[9]  N. Ueda,et al.  Mixture density estimation via EM algorithm with deterministic annealing , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[10]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[12]  Mihai Datcu,et al.  Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation , 2010, IEEE Geoscience and Remote Sensing Letters.

[13]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[14]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Mihai Datcu,et al.  Categorization based Relevance Feedback Search Engine for Earth Observation Images Repositories , 2006, 2006 IEEE International Symposium on Geoscience and Remote Sensing.

[16]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[17]  Henri Maître,et al.  Kernel MDL to Determine the Number of Clusters , 2007, MLDM.

[18]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[19]  Lorenzo Bruzzone,et al.  A Novel Transductive SVM for Semisupervised Classification of Remote-Sensing Images , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[22]  A. McQuarrie,et al.  Regression and Time Series Model Selection , 1998 .

[23]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Alberto Del Bimbo,et al.  Diversity in multimedia information retrieval research , 2006, MIR '06.

[25]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Javier Béjar,et al.  Generality-Based Conceptual Clustering with Probabilistic Concepts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Mihai Datcu,et al.  Semi-supervised learning and discovery of unkown structures among data: Application to satellite image annotation , 2009, 2009 IEEE International Geoscience and Remote Sensing Symposium.

[29]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[30]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..