Using Shannon's entropy to sample heterogeneous and high‐dimensional atmospheric datasets

The construction of diverse and synthetic datasets of atmospheric situations, used as first guesses or training bases for remote-sensing algorithms, is still a challenge. Numerical constraints require the use of datasets with a limited number of representative situations, but keeping, as much as possible, the full diversity observed in nature. This study presents an innovative sampling method that allows extraction of a new, more limited, dataset from a large database of atmospheric situations. One major issue of such sampling concerns the heterogeneity of the input space variables: different units and ranges of temperatures and specific humidities, for instance, or locations from the lower troposphere to the higher stratosphere, can hardly be compared. We illustrate the fact that sampling using only one variable type is not optimal, since erroneous features can be observed in the other variables not used for the sampling. The use of Shannon's entropy can help to develop a sampling technique able to deal with very heterogeneous variables. A dataset of 10 000 situations is built from EUMETSAT satellite atmospheric retrievals: it includes temperature and water-vapour profiles, four integrated ozone layers and surface temperature. The sampling increases the entropy of the original dataset from 22 to 28 (about 20% increase).

[1]  Alain Chedin,et al.  A Neural Network Approach for a Fast and Accurate Computation of a Longwave Radiative Budget , 1998 .

[2]  C. Rodgers Characterization and Error Analysis of Profiles Retrieved From Remote Sounding Measurements , 1990 .

[3]  Filipe Aires,et al.  Remote sensing from the infrared atmospheric sounding interferometer instrument 1. Compression, denoising, and first-guess retrieval algorithms , 2002 .

[4]  C. Prigent,et al.  Synergistic multi‐wavelength remote sensing versus a posteriori combination of retrieved products: Application for the retrieval of atmospheric profiles using MetOp‐A , 2012 .

[5]  Filipe Aires,et al.  Sampling techniques in high‐dimensional spaces for the development of satellite remote sensing database , 2007 .

[6]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[7]  Filipe Aires,et al.  Measure and exploitation of multisensor and multiwavelength synergy for remote sensing: 1. Theoretical considerations , 2011 .

[8]  Dong-Bin Shin,et al.  The Evolution of the Goddard Profiling Algorithm (GPROF) for Rainfall Estimation from Passive Microwave Sensors , 2001 .

[9]  F. Aires,et al.  A new neural network approach including first guess for retrieval of atmospheric water vapor, cloud liquid water path, surface temperature, and emissivities over land from satellite microwave observations , 2001 .

[10]  A. Chedin,et al.  The Improved Initialization Inversion Method: A High Resolution Physical Method for Temperature Retrievals from Satellites of the TIROS-N Series. , 1985 .

[11]  Alain Chedin,et al.  TIGR‐like atmospheric‐profile databases for accurate radiative‐flux computation , 2000 .

[12]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.