Migratable urban street scene sensing method based on vision language pre-trained model