Rethinking Visual Prompt Learning as Masked Visual Token Modeling