Towards Scalable Data-Driven Authorship Attribution