Development of Pashto Treebank

This paper is about the development of Pashto Treebank in the form of Extensible Markup Language (XML) code. A Chart Parser has been developed that uses Chart Parsing Algorithm [1] for building parse trees for Pashto sentences. The output of the parser is the parsed text which can be obtained in one of its three forms such as reduced graph, parse tree and XML code. For parsing, the parser needs Context Free Grammar (CFG) of Pashto language and Tagged Input Text as input. The system has been tested on real world text taken from Pashto novels and web sites and tagged manually. Eighty seven (87) sentences were parsed by the parser in which fifty four (54) were correctly parsed with a single parse tree and the rest 33 were parsed with multiple trees and thus the accuracy obtained is 62.06%.