Investigating the impact of preprocessing on document embedding: an empirical comparison