Do dialogue representations align with perception? An empirical study