ESSM: an extractive summarization model with enhanced spatial-temporal information and span mask encoding