Shallow Parsing of Transcribed Speech of Estonian and Disfluency Detection

This paper introduces our strategy for adapting a rule based parser of written language to transcribed speech. Special attention has been paid to disfluencies (repairs, repetitions and false starts). A Constraint Grammar based parser was used for shallow syntactic analysis of spoken Estonian. The modification of grammar and additional methods improved the recall from 97.5% to 97.6% and precision from 91.6% to 91.8%. Also, the paper gives a detailed analysis of the types of errors made by the parser while analyzing the corpus of disfluencies.