Editorial for issue 2/2018

The present issue 2 of volume 12 (2018) of the journal Advances in Data Analysis and Classification (ADAC) includes 10 articles which deal with: Gaussian and non-Gaussian mixture modeling, selection of ‘reasonable’ and non-degenerate cluster solutions, clustering imbalanced and high-dimensional data, clusterwise multiblock analysis, linear discrimination for supervised versus partially supervised data, learning of a distance metric, clustering of shapes by currents, a semiparametric Bayesian model for hospital evaluation, and estimation of a precision matrix. One of the difficult and most intriguing problems in clustering is the selection of a ‘final’ partition of objects from a set of ‘plausible’ partitions that have been obtained from corresponding data by using a special or even various clustering algorithms. A typical case is provided when then optimization of a performance criterion (e.g., the classification or mixture likelihood in model-based clustering) has resulted in a large number of different locally optimum partitions from which the most reasonable ones should be singled out. In their article “Probabilistic clustering via Pareto solutions and significance tests”, presented as the first paper of this issue, María Teresa Gallegos and Gunter Ritter propose a two-stage approach for this problem in the case of the multidimensional heteroscedastic normal clustering model. In a first stage, a large number of local solutions (partitions) are computed. For all obtained partitions a suitable scale difference estimate (called HDBT ratio) is determined, and all are displayed in a scale versus fit balance (SFB) plot where scale is plotted against model fit (i.e., minus log likelihood). Only ‘reasonable’ partitions are considered in the sequel, i.e., those corresponding to Pareto points in this plot (thereby avoiding clustering with unbalanced scales among classes). In a second step, and as a final choice, only those partitions are retained that are ‘well separated by location’, i.e., those in which the classical tests of Wilks, Hotelling and Behrens–Fisher for equality of locations provide ‘large’ p values. The innovative part of this paper is not only the combination of both steps, but also the fact that a new iterative, parameter-free cutting-plane algorithm for the multivariate Behrens–Fisher problem is provided. In a similar vein, the second article deals with the likelihood approach for estimating the parameters of mixtures of elliptical distributions. In order to avoid degenerate estimates or spurious parameter constellations (that finally would lead to unbalanced cluster configurations) there exist various approaches in the literature that use constraints involving the eigenvalues of the corresponding covariance