Induction of Classifications from Linguistic Data

We present a flexible approach for extracting hierarchical classifications from linguistic data. To this end, the framework of observational logic is introduced, which extends the logic that underlies standard Formal Concept Analysis by allowing disjunctive rules and exclusions. We give a rigorous mathematical characterization of how the chosen rule type affects the structure of the induced hierarchy. The framework is applied to the induction of hierarchical classifications from linguistic databases. The pros and cons of several types of hierarchies are discussed in detail with respect to criteria such as compactness of representation, suitability for inference tasks, and intelligibility for the human user. 1 THE LOGIC OF LINGUISTIC CLASSIFICATION A simple method for classifying (linguistic) data is provided by taxonomic trees, which are ubiquitous in linguistic textbooks. For example, nominal words are traditionally subdivided into pronouns, nouns, adjectives, etc; pronouns are further subdivided into interrogative pronouns, personal pronouns, etc, etc. From a logical point of view each concept of a taxonomic tree implies its superordinate concept; e.g. pronoun implies nominal word . Furthermore, any two subconcepts of the same concept are incompatible, as e.g. noun and adjective. In addition, classification by taxonomic trees is often assumed to be exhaustive in the sense that every concept implies the disjunction of its immediate subconcepts. Systemic networks, which have their roots in systemic grammar (e.g. [10]), provide a more sophisticated formalism for presenting linguistic classification. Figure 1 shows a small fragment of such a network. The classifiers aligned to the right of a bar constitute a

[1]  Anja Großkopf,et al.  Formal concept analysis of verb paradigms in linguistics , 1996 .

[2]  Rainer Osswald,et al.  A logic of classification with applications to lingustic theory , 2002 .

[3]  Gisela Harras,et al.  Begriffliche Erkundung semantischer Strukturen von Sprechaktverben , 2000 .

[4]  Alex Lascarides,et al.  Default Representation in Constraint-based Frameworks , 1999, Comput. Linguistics.

[5]  Gerd Stumme,et al.  Distributive Concept Exploration - A Knowledge Acquisition Tool in Formal Concept Analysis , 1998, KI.

[6]  L. Beran,et al.  [Formal concept analysis]. , 1996, Casopis lekaru ceskych.

[7]  Bob Carpenter,et al.  Inclusion, Disjointness and Choice: The Logic of Linguistic Classification , 1991, ACL.

[8]  Rainer Osswald Classifying Classification , 2001, Electron. Notes Theor. Comput. Sci..

[9]  Terry Winograd,et al.  Language as a cognitive process 1: Syntax , 1982 .

[10]  Uta Priss,et al.  The formalization of word net by methods of relational concept analysis , 1996 .

[11]  Ivan A. Sag,et al.  Information-based syntax and semantics , 1987 .

[12]  Steven Vickers Topology via constructive logic , 1999 .

[13]  Gerald Gazdar,et al.  DATR: A Language for Lexical Knowledge Representation , 1996, CL.

[14]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[15]  Ivan A. Sag,et al.  Syntactic Theory: A Formal Introduction , 1999, Computational Linguistics.

[16]  Jon Barwise,et al.  Information Flow: The Logic of Distributed Systems , 1997 .

[17]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[18]  Bernhard Ganter,et al.  Attribute Exploration with Background Knowledge , 1999, Theor. Comput. Sci..

[19]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .