Selecting the UD v2 Morphological Features for Indonesian Dependency Treebank

The objectives of our work are to propose the relevant Universal Dependencies (UD) morphological features for Indonesian dependency treebank and to apply the proposed features to an existing treebank. We propose the use of 14 UD v2 features and the corresponding 27 feature-value tags. To evaluate the quality of the resulting treebank, we built models for lemmatization, POS tagging, morphological features analysis, and dependency parsing using UDPipe, a trainable pipeline for tokenization, tagging, lemmatization, and dependency parsing of CoNLL-U files. For lemmatization, POS tagging, and morphological features analysis tasks, the resulting models have F1-score of more than 93% that shows that the consistency of annotations for the columns LEMMA, UPOS, and FEATS in the treebank is already good. However, the accuracy of the Indonesian dependency parser built is still only 82.59% for UAS and 79.83% for LAS. The experiments also show that morphological features information has no or little impact on improving the quality of lemmatization, POS tagging, and dependency parsing models for Indonesian.