Detecting Demographic Bias in Automatically Generated Personas

We investigate the existence of demographic bias in automatically generated personas by producing personas from YouTube Analytics data. Despite the intended objectivity of the methodology, we find elements of bias in the data-driven personas. The bias is highest when doing an exact match comparison, and the bias decreases when comparing at age or gender level. The bias also decreases when increasing the number of generated personas. For example, the smaller number of personas resulted in underrepresentation of female personas. This suggests that a higher number of personas gives a more balanced representation of the user population and a smaller number increases biases. Researchers and practitioners developing data-driven personas should consider the possibility of algorithmic bias, even unintentional, in their personas by comparing the personas against the underlying raw data.