Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations