Demographic Breakdown of Twitter Users: An analysis based on names

We propose an approach for age estimation using solely people’s rst names by extending an already existing method proposed by Chang et al. for ethnicity estimation. We demonstrate that proposed method is able to predict age of a person as well as the age breakdown of an entire population better than the natural alternatives. We then apply both the age and the ethnicity method to Twitter US users and perform the largest demographic analysis of the platform to the best of our knowledge. First, we closely replicate the ndings about Twitter demographics in the most recent Pew Research report suggesting that name might be a useful indicator especially for aggregate analysis. Second, we demonstrate that our approach can overcome a methodological limitation in Pew Research study by estimating breakdown for all age groups including less than 18 years old age group. Third, we discover that Twitter US users has always been diverse, though some demographic groups are over-represented and some are under-represented with respect to the general internet users. We also nd strong evidence that dierent demographic groups both in terms of age and ethnicity have dierent usage patterns on the platform in terms of their following relationships, topical conversations, and the time in the day to use the platform.