When do contrastive learning signals help spatio-temporal graph forecasting?