A Maximum Entropy and Rules Approach to Identifying Tibetan Sentence Boundaries

Sentence boundary identification is a fundamental work in the field of Tibetan information processing.This paper proposes a maximum entropy and rules approach to identifying Tibetan sentence boundaries.First,the Tibetan boundary vocabulary based detector identifies the ambiguous sentence boundaries.Second,the maximum entropy model based detector identifies the ambiguous sentence boundaries which the former detector can't identify.By making use of Tibetan sentence boundary rules,this approach further reduces the number of the incorrect sentence boundary identified by maximum entropy model owing to the sparse and inferior training corpus.The experiments show that this approach has a good performance in terms of 97.78% F1-measure.