Sentence Boundary Detection for Hindi–English Social Media Text

In this paper, we present an approach of automatic sentence boundary detection for Hindi–English Codemixed social media texts. We develop a corpus of Hindi–English Codemixed posts collected from Facebook and made an in-depth study to explore the limitations of using existing rule-based sentence boundary detection systems on codemixed social media text. Our proposed approach is a rule-based sentence boundary detection approach which is tested on our developed corpus and outperforms over the existing approaches.