The median of a set of histogram data

According to Symbolic Data Analysis, a histogram variable describes each object by means of a histogram of values to an object rather than a single value. In the literature, the definition of the average and the standard deviation has been extended to histogram variables. In this paper, we propose a definition and an algorithm for extracting the main order statistics, the median and quartiles, for a histogram variable observed on a set of units. In particular, for we propose to define a median histogram according to a criterion that minimizes the sum of l1 Wasserstein distances, a particular probabilistic metric, between distributions. We show that the solution of the problem requires to search for a level-wise order defined on the quantile functions (the inverse of the cumulative distribution functions) of the corresponding histogram data. Evidences from an application on real data show that the proposed order statistics for a histogram variable have similar properties to the classic order statistics for a single-valued variable. Finally, we propose two skewness indices for a histogram variable based on the comparison between the average and the median quantile functions.