Benchmarking Compositionality with Formal Languages