On the need for synthetic data and robust data simulators in the 2020s

As observational datasets become larger and more complex, so too are the questions being asked of these data. Data simulations, i.e., synthetic data with properties (pixelization, noise, PSF, artifacts, etc.) akin to real data, are therefore increasingly required for several purposes, including: (1) testing complicated measurement methods, (2) comparing models and astrophysical simulations to observations in a manner that requires as few assumptions about the data as possible, (3) predicting observational results based on models and astrophysical simulations for, e.g., proposal planning, and (4) mitigating risk for future observatories and missions by effectively priming and testing pipelines. We advocate for an increase in using synthetic data to plan for and interpret real observations as a matter of routine. This will require funding for (1) facilities to provide robust data simulators for their instruments, telescopes, and surveys, and (2) making synthetic data publicly available in archives (much like real data) so as to lower the barrier of entry to all.