A Pipeline for Generating, Annotating and Employing Synthetic Data for Real World Question Answering