An Approach to Discourse Parsing using Sangati and Rhetorical Structure Theory

Sanskrit literature has many nuggets that could be applied to mo dern linguistic applications. One such nugget is the concept of sangati. Sangati expresses continuity and proper positioning of piece of text which is similar to the modern Rhetorical St ructure Theory (RST). We propose two discourse parsers namely sangati based discourse parser and RST-Sangati based discourse parser. The proposed discourse parsers are extensions of t he existing RST based discourse parser. We have used Naive Bayes probabilistic classifier for disco urse relation and sangati labelling. We have tested our discourse parsers using 500 Tamil tourism domain specific documents and 21 RST- Discourse Tree (RST-DT) English documents. We have compared the performance of both the proposed discourse parsers and observed that when RST and sangati are used in union, the performance of the discourse parser is better. Also, we have done a performance comparison with two existing discourse parsers and have shown better performance.