Projectivity in Totally Ordered Rooted Trees : An Alternative De fi nition of Projectivity and Optimal Algorithms for Detecting Non-Projective Edges and Projectivizing Totally Ordered Rooted Trees

This paper discusses the notion of projectivity and algorithms for projectivizing and detecting non-projective edges in totally ordered rooted trees (such trees are used in dependency syntax analysis of natural language, where they are called dependency trees). In the first part, we review the notion of projectivity, then we present a new definition inspired by the algorithmic inquiry and show its equivalence with the classical definitions. We define the canonical projectivization of a totally ordered rooted tree (preserving the tree structure and the relative ordering for all inner nodes and their immediate dependents) and show its uniqueness; we also give a generalization of this result. We then discuss some properties of non-projective edges relevant for the algorithms presented in the following section. In the second part, we present a data representation of totally ordered rooted trees and algorithms for this data representation. The first algorithm computes the projectivization of the input tree, the second algorithm detects non-projective edges of certain types in the input tree (we also give a hint on finding all non-projective edges using its output). Both algorithms can be used for checking projectivity. We prove that the algorithms are optimal: they have time complexities O(n). Furthermore, they can be straightforwardly combined into a single algorithm, preserving the time complexity. 1 Projectivity in Totally Ordered Rooted Trees This section discusses the condition of projectivity in totally ordered rooted trees. First we give a definition of a totally ordered rooted tree and introduce some notation, then we present the classical definition of projectivity and introduce a new one showing its equivalence, we define the notion of projectivization and show its uniqueness, and finally we divide non-projective edges into three classes and discuss their relationships. 1.1 Totally Ordered Rooted Trees We give a definition of totally ordered rooted trees, without proofs briefly review some basic properties of rooted trees, and introduce the notation used in this paper. 1.1.1 Definition A totally ordered rooted tree is a quadruple (V,E,r,≤), where (V,E,r) is a rooted tree (V being the finite set of vertices (or nodes), E the set of edges (unordered pairs of nodes), and r ∈V the root) and ≤ a linear ordering on V . (A totally ordered rooted tree is often called a dependency tree.) In a rooted tree (V,E,r), there is a unique path from the root r to every node a, say x0 = r, x1, . . . , xn = a, n ≥ 0, where {xi,xi+1} ∈ E for 0 ≤ i < n. Therefore every node a has a uniquely defined level equal to the length of the path connecting it with the root, i.e. n, which we will denote lev(a). For every node a $= r, we will call b= xn−1 the parent of a (with notation a→ b; we will also say that a is a child of b or that a depends on b). A node with no children is called a leaf, a node which is not a leaf is an internal node. Obviously, in a rooted tree there is a one-to-one correspondence between the edges and nodes different from the root (edges correspond uniquely to their “lower” nodes). Nodes with the same parent are called siblings. The height of a rooted tree is the maximal level occurring in it. When talking about the tree structure, we will use “vertical-axis” terms such as “above”, “below”, “upper”, “lower” etc., with the root being the highest and the other nodes ordered downwards reversely with respect to their level. (Rooted trees are usually drawn “upside-down” with root at the top and other nodes according to their level downwards, with nodes at the same level in the tree drawn on the same horizontal line.) The reflexive transitive closure of the relation → will be denoted !; for a ! c we will say that c is an ancestor of a, or a is a descendant of c, or that a is subordinated to c. (Note that the relation of dependency→ is irreflexive, whereas the relation of subordination ! is defined as reflexive.) For every node a of a rooted tree T = (V,E,r) we call the tree Ta = (Va,Ea,a), where Va = {x ∈V | x! a}, Ea = {{x,y} ∈ E | x,y ∈Va}, the subtree of T rooted in node a. When talking about the linear ordering ≤ on nodes of a totally ordered rooted tree (V,E,r,≤), we will use the usual notation a ≥ b meaning b ≤ a, and a < b meaning a ≤ b and a $= b (and similarly for >); we will also be using “horizontal-axis” terms such as “left”, “right”, “in between” etc. with the obvious meaning (we will say that a is to the left from b when a< b, etc.). When drawing totally ordered rooted trees, we accept the following conventions: Nodes are drawn top-down according to their level, with nodes on the same level on the same horizontal line, with the root at the top; nodes are drawn from left to right according to the linear ordering on nodes. Edges are drawn as solid lines. For an edge a→ b of a totally ordered rooted tree T = (V,E,r,≤), we call the interval in the linear ordering delimited by the nodes a and b the span of the edge a→ b. Please note that the notion of a totally ordered rooted tree (cf. Definition 1.1.1) differs from the notion of an ordered rooted tree, where for every internal node only a linear ordering of its children is given (i.e. the ordering is not total, it is specified only for sibling nodes). Here we are concerned with rooted trees with a total linear ordering on their nodes. For the sake of brevity of the definitions in the following section we introduce two predicates: • A ternary predicate representing the “strictly in between” relation: Inb(x,u,v) df = (u< x & x< v)∨ (v< x & x< u) . (Obviously, Inb(x,u,v) should be read as “x lies (strictly) between u and v”.) • A ternary predicate representing the “being siblings” relation: Sibl(u,v,b) df = (u→ b & v→ b & u $= v) . (Sibl(u,v,b) should be read as “u and v are different children of their common parent b”.) We will be taking advantage of the fact that both predicates are symmetric in two of their arguments (Inb in its second and third arguments, Sibl in its first and second arguments). 1.2 Condition of Projectivity for Totally Ordered Rooted Trees We begin by giving a definition of projectivity using three conditions proved to be equivalent by Marcus (1965) (we take over their denotation), and then present a new condition and prove that it is equivalent to one of the classical ones. 1.2.1 Definition (Marcus (1965)) A totally ordered rooted tree T = (V,E,r,≤) is projective if the following equivalent conditions hold: (H-H) (∀a,b,x ∈V ) ( a→ b & Inb(x,a,b) =⇒ x! b ) , Figure 1: A sample projective tree Figure 2: A sample non-projective tree (L-I) (∀a,b,x ∈V ) ( a! b & Inb(x,a,b) =⇒ x! b ) , (F) (∀u,v,b,x ∈V ) ( u! b & v! b & Inb(x,u,v) =⇒ x! b ) . A totally ordered rooted tree not satisfying the conditions is called non-projective. (See Figures 1 and 2 for examples of projective and non-projective totally ordered rooted trees, respectively.) We will not repeat here the proof of the equivalence of the three conditions in Definition 1.2.1, it is quite straightforward and relies on the simple fact that for every two nodes in the relation of subordination there exists a unique finite path between them formed by edges of the rooted tree. All three conditions in Definition 1.2.1 have in common the following: in a configuration where two (or three) nodes have some structural relationship (i.e. a relationship via the tree structure) and there is a node x between them in the linear ordering, they predicate that the node x be in an analogous structural relationship. Condition (F) is perhaps most transparent as far as regards the structure of the whole tree. It says that every subtree of a projective tree must be contiguous in the linear ordering. A simple reformulation of the condition (F) gives the following condition of projectivity, which makes this point even more clear: 1 (F’) (∀u,v,b ∈V ) ( u! b & v! b=⇒ ¬(∃x ∈V )(Inb(x,u,v) & x $! b) ) . The condition (F’) leads naturally to the notion of a gap in the coverage of a subtree (a gap is the set of “extra-subtree” nodes in the span of the subtree, i.e. between any nodes of the subtree in the linear ordering). Such notion of a gap was used by Holan et al. (1998), who introduce measures of non-projectivity and present a class of dependency-based formal grammars allowing for a varying degree of word-order freedom; Holan et al. (2000) present linguistic considerations concerning Czech and English. In our study, however, we will be concerned with a different notion of a gap. 1The equivalence of conditions (F) and (F’) is straightforward, see the following first-order-logic reasoning: (∀u,v,b,x ∈V )(u! b & v! b & Inb(x,u,v) =⇒ x! b) ⇐⇒ (∀u,v,b,x ∈V )(u! b & v! b=⇒ (Inb(x,u,v) =⇒ x! b)) ⇐⇒ (∀u,v,b,x ∈V )(u! b & v! b=⇒ (¬Inb(x,u,v)∨ x ! b)) ⇐⇒ (∀u,v,b,x ∈V )(u! b & v! b=⇒ ¬(Inb(x,u,v) & x $! b)) ⇐⇒ (∀u,v,b ∈V )(u! b & v! b=⇒ (∀x ∈V )(¬(Inb(x,u,v) & x $! b))) ⇐⇒ (∀u,v,b ∈V )(u! b & v! b=⇒ ¬(∃x ∈V )(Inb(x,u,v) & x $! b)). In a non-projective totally ordered rooted tree, there exists at least one edge a→ b and a node x not satisfying the condition (H-H). We will call such an edge a non-projective edge of the totally ordered rooted tree. The set Xa→b = {x ∈V | Inb(x,a,b) & x $! b} of all nodes causing the non-projectivity of the edge a→ b will be called the gap of the edge a→ b. Let us now present in the form of a theorem another condition which is equivalent to the conditions in Definition 1.2.1. 1.2.2 Theorem A totally ordered rooted tree T = (V,E,r,≤) is projective if and only if the following condition holds: (*) (∀a1,a2,b,u1,u2 ∈V ) ([ a1 → b & u1 ! a1 & ( [a2 = b & u2 = b]∨ [Sibl(a1,a2,b) & u2 ! a2] )] =⇒ [a1 < a2 ⇔ u1 < u2] )