ABSTRACT We consider the problem of learning a certain type of lexical semantic knowledge that can be expressed as a binary relation between words, such as the so-called sub-categorization of verbs (a verb-noun relation) and the compound noun phrase relation (a noun-noun relation). Specifically, we view this problem as an on-line learning problem in the sense of Littlestone's learning model [Lit88] in which the learner's goal is to minimize the total number of prediction mistakes. In the computational learning theory literature, Goldman, Rivest and Schapire [GRS93] and subsequently Goldman and Warmuth [GW93] have considered the on-line learning problem for binary relations R: X x Y → {0,1} in which one of the domain sets X can be partitioned into a relatively small number of types, namely clusters consisting of behaviorally indistinguishable members of X. In this paper, we extend this model and suppose that both of the sets JT, X,Y can be partitioned into a small number of types, and propose a host of prediction algorithms which are two-dimensional extensions of Goldman and Warmuth's weighted majority type algorithm proposed for the original model. We apply these algorithms to the learning problem for the ‘compound noun phrase’ relation, in which a noun is related to another just in case they can form a noun phrase together. Our experimental results show that all of our algorithms out-perform Goldman and Warmuth's algorithm. We also theoretically analyze the performance of one of our algorithms, in the form of an upper bound on the worst case number of prediction mistakes it makes.
[1]
Manfred K. Warmuth,et al.
Learning binary relations using weighted majority voting
,
2004,
Machine Learning.
[2]
Kenneth Ward Church,et al.
Word Association Norms, Mutual Information, and Lexicography
,
1989,
ACL.
[3]
N. Littlestone.
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
,
1987,
28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[4]
Manfred K. Warmuth,et al.
The weighted majority algorithm
,
1989,
30th Annual Symposium on Foundations of Computer Science.
[5]
Naftali Tishby,et al.
Distributional Clustering of English Words
,
1993,
ACL.
[6]
Philip Resnik,et al.
Semantic Classes and Syntactic Ambiguity
,
1993,
HLT.
[7]
Naoki Abe,et al.
On-line learning of binary and n-ary relations over multi-dimensional clusters
,
1995,
COLT '95.
[8]
Ronald L. Rivest,et al.
Learning Binary Relations and Total Orders
,
1989,
COLT 1989.