Learning by Watching via Keypoint Extraction and Imitation Learning