Vision Language Navigation with Multi-granularity Observation and Auxiliary Reasoning Tasks