Hierarchical Mixtures of GLMs for Combining Multiple Ground Truths

In real-world machine learning problems it is often the case that the gold-standard for a particular learning problem is not accurately reflected by any one particular data set. For example, when modeling the landing-page quality associated with a search result, labels from human evaluators are often biased towards “brandname” sites, whereas labels derived from conversions can potentially confound search abandonment and successful conversion. In this paper we propose a class of models for characterizing and isolating the relative bias of a prediction problem across multiple data sets. These models can be used either as tools for data analysis, with the goal of calculating the divergence to the hypothetical gold-standard, or as smoothing procedures aimed at capturing as much shared structure between the domains as possible.