Topic Segmentation for Textual Document Written in Arabic Language

Abstract Topic segmentation is important for many natural language processing applications such as information retrieval, text summarization. In our work, we are interested in the topic segmentation of textual document. We present a survey of related works particularly C99 and TextTiling. Then, we propose an adaptation of these topic segmenters for textual document written in Arabic language named as ArabC99 and ArabTextTiling. For experimental results, we construct an Arabic corpus based on newspapers of different Arab countries. Finally, we evaluate the performance of these new segmenters by comparing them together and to related works using the metrics WindowDiff and F-measure.