Have we really been analyzing terminating simulations incorrectly all these years?

We all know how to estimate a confidence interval for the mean based on a random sample. The interval is centered on the sample mean, with the half-width proportional to the sample standard error. We know also that terminating simulations generate independent observations. What simulators appear to have overlooked is that independence alone is insufficient to guarantee a valid random sample - the observations must also be identically distributed. This is a good assumption if the outcome of each replication is a single observation, but it is demonstrably incorrect if the outcome is an aggregate value and the replications have differing numbers of observations. In this paper we explore the implications of this oversight when within-replication observations are independent. We then derive analytic results showing that although the impact on interval estimates can sometimes be negligible, there also are circumstances where the variance of our estimates is significantly increased. We finish with a simple example which demonstrates the potential impact for practitioners.