Sentence Boundary Detection for Hindi–English Social Media Text
暂无分享,去创建一个
In this paper, we present an approach of automatic sentence boundary detection for Hindi–English Codemixed social media texts. We develop a corpus of Hindi–English Codemixed posts collected from Facebook and made an in-depth study to explore the limitations of using existing rule-based sentence boundary detection systems on codemixed social media text. Our proposed approach is a rule-based sentence boundary detection approach which is tested on our developed corpus and outperforms over the existing approaches.
[1] Andrei Mikheev,et al. Tagging Sentence Boundaries , 2000, ANLP.
[2] Adwait Ratnaparkhi,et al. A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.
[3] Amitava Das,et al. Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages , 2015, RANLP.
[4] P RamakanthKumar.,et al. Sentence Boundary Detection in Kannada Language , 2012 .
[5] Amitava Das,et al. Sentence Boundary Detection for Social Media Text , 2015, ICON.