Building extraction with vision transformer