Typical urban and rural temperature records are essential for the estimation and comparison of urban heat island effects in different regions, and the key issues are how to identify the typical urban and rural stations. This study tried to analyze the similarity of air temperature sequences by using dynamic time warping algorithm (DTW) to improve the selection of typical stations. We examined the similarity of temperature sequences of 20 stations in Beijing and validated by remote sensing, and the results indicated that DTW algorithm could identify the difference of temperature sequence, and clearly divide them into different groups according to their probability distribution information. The analysis for station pairs with high similarity could provide appropriate classification for typical urban stations (FT, SY, HD, TZ, CY, CP, MTG, BJ, SJS, DX, FS) and typical rural stations (ZT, SDZ, XYL) in Beijing. We also found that some traditional rural stations can’t represent temperature variation in rural surface because of their surrounding environments highly modified by urbanization process in last decades, and they may underestimate the urban climate effect by 1.24℃. DTW algorithm is simple in analysis and application for temperature sequences, and has good potentials in improving urban heat island estimation in regional or global scale by selecting more appropriate temperature records.