Traffic flow forecasting is a typical problem of spatial-temporal data forecasting. Because traffic data show complex patterns and traffic graph is difficult to process, it is very challenging. And most existing traffic flow prediction methods lack the ability to model the dynamic spatial-temporal features. These methods cannot yield satisfactory prediction results. In the paper, we propose Attention Based Multi-Unit Spatial-Temporal Network (AMU-STN) solve the traffic flow forecasting problem. The model mainly consisting three units, specifically, spatial-temporal attention unit to effectively capture the dynamic spatial-temporal correlations, spatial-temporal feature extraction unit to extract long-range feature, and prediction unit to predict the feature status. For dealing with long-term predictions, we stack multiple layers of attention units and feature extraction units. We use a residual connection to avoid gradient vanish due to deepening of network layers. Experimental results on two real-world datasets from Caltrans Performance Measurement System (PeMS) prove ours model outperforms the state-of-the-art baselines. We also analyze the weight of the attention matrix; the result shows the effectiveness of the attention mechanism and reflects the interpretability of ours model.