Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM

In this paper, we describe the Machine Learning system, ASIUM, which learns Subcaterorization Frames of verbs and ontologies from the syntactic parsing of technical texts in natural language. The restrictions of selection in the subcategorization frames are filled by the ontology's concepts. Applications requiring such knowledge are crucial and numerous. The most direct applications are semantic control of texts and syntactic parsing disambiguation. This knowledge acquisition task cannot be fully automatically performed. Instead,we propose a cooperative ML method which provides the user with a global view of the acquisition task and also with acquisition tools like automatic concepts splitting, example generation, and an ontology view with attachments to the verbs. Validation steps using these features are intertwined with learning steps so that the user validates the concepts as they are learned. Experiments performed on two different corpora (cooking domain and patents) give very promising results.

[1]  David Faure,et al.  A corpus-based conceptual clustering method for verb frames and ontology , 1998 .

[2]  Maria Teresa Pazienza,et al.  Information Extraction A Multidisciplinary Approach to an Emerging Information Technology , 1997, Lecture Notes in Computer Science.

[3]  Michael R. Brent,et al.  Automatic Acquisition of Subcategorization Frames from Tagged Text , 1991, HLT.

[4]  Raymond J. Mooney,et al.  Learning Semantic Grammars with Constructive Inductive Logic Programming , 1993, AAAI.

[5]  Gregory Grefenstette,et al.  SEXTANT: Exploring Unexplored Contexts for Semantic Extraction from Syntactic Analysis , 1992, ACL.

[6]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[7]  Peter Willett,et al.  The limitations of term co-occurrence data for query expansion in document retrieval systems , 1991, J. Am. Soc. Inf. Sci..

[8]  Ralph Grishman,et al.  Generalizing Automatically Generated Selectional Patterns , 1994, COLING.

[9]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[10]  S. Buchholz,et al.  Distinguishing complements from adjuncts using memory-based learning , 1998 .

[11]  Didier Bourigault,et al.  LEXTER, a Natural Language Processing Tool for Terminology Extraction , 1996 .

[12]  Brian R. Gaines,et al.  Comparing conceptual structures: consensus, conflict, correspondence and contrast , 1989 .

[13]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[14]  Roberto Basili,et al.  Lexical Acquisition and Information Extraction , 1997, SCIE.

[15]  Cynthia A. Thompson Acquisition of a Lexicon from Semantic Representations of Sentences , 1995, ACL.