Joint Learning of Reward Machines and Policies in Environments with Partially Known Semantics