Structure and design of multimodal dataset for automatic regex synthesis methods in Roman Urdu