Towards Understanding Situated Text : Concept Labeling and Weak Supervision

Much of the focus of the natural language processing community lies in solving syntactic or semantic tasks with the aid of sophisticated machine learning algorithms and the encoding of linguistic prior knowledge. One of the most important features of natural languages is that their real-world use (as a tool for humans) is to communicate something about our physical reality or metaphysical considerations of that reality. This is strong prior knowledge that is simply ignored in most current systems. For example, in current parsing systems there is no allowance for the ability to disambiguate a sentence given knowledge of the physical reality of the world. If one happened to know that Bill owned a telescope while John did not, then this should affect parsing decisions given the sentence “John saw Bill in the park with his telescope.” Similarly, one can improve disambiguation of the word bank in “John went to the bank” if one happens to know whether John is out for a walk in the countryside or in the city. In summary, many human disambiguation decisions are in fact based on whether the current sentence agrees well with one’s current world model.