Learning Dependencies between Case Frame Slots

A theoretically sound method for learning dependencies between case frame slots is proposed. In particular, the problem is viewed as that of estimating a probability distribution over the case slots represented by a dependency graph (a dependency forest). Experimental results indicate that the proposed method can bring about a small improvement in disambiguation, but the results are largely consistent with the assumption often made in practice that case slots are mutually independent, at least when the data size is at the level that is currently available.

[1]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[4]  Eric Brill,et al.  A Rule-Based Approach to Prepositional Phrase Attachment Disambiguation , 1994, COLING.

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[7]  Philip Resnik,et al.  Semantic Classes and Syntactic Ambiguity , 1993, HLT.

[8]  Keh-Yih Su,et al.  GPSM: A Generalized Probabilistic Semantic Model for Ambiguity Resolution , 1992, ACL.

[9]  Stig K. Andersen,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[10]  Hang Li,et al.  Generalizing Case Frames Using a Thesaurus and the MDL Principle , 1995, CL.

[11]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[12]  Hiyan Alshawi,et al.  Training and Scaling Preference Functions for Disambiguation , 1994, Comput. Linguistics.

[13]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[14]  Ralph Grishman,et al.  Generalizing Automatically Generated Selectional Patterns , 1994, COLING.

[15]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[16]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[17]  Jeremy J. Carroll,et al.  Automatic Learning for Semantic Collocation , 1992, ANLP.

[18]  Joe Suzuki,et al.  A Construction of Bayesian Networks from Databases Based on an MDL Principle , 1993, UAI.

[19]  Michael Collins,et al.  Prepositional Phrase Attachment through a Backed-off Model , 1995, VLC@ACL.

[20]  Kenneth Ward Church,et al.  Poor Estimates of Context are Worse than None , 1990, HLT.

[21]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[22]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[24]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[25]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[26]  Hang Li,et al.  A Probabilistic Approach to Lexical Semantic Knowledge Acquisition and Structural Disambiguation , 1998, ArXiv.

[27]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.