A Rule-Based Annotation System to Extract Tajweed Rules from Quran

Quran Recitation relies on identifying and applying different Tajweed rules [ÞæÇÚÏ ÇáÊÌæíÏ] such as Muddud [ãÏæÏ] and Tanween [Êäæíä] in the Quran text. This research is aimed at providing a tool that automatically finds and annotates letters that embody Tajweed rules in Quran text. This field remains an open research area due to the lack of open source NLP tools that support the Arabic language. Applying Natural Language Processing (NLP) techniques on Quran text to extract Tajweed letters is considered an important Information Extraction (IE) step. This research explores the field of applying IE techniques on Quran text. Rule based IE techniques are well known to achieve optimal results. This research explores NLP techniques on Quranic text using GATE, an open source flexible NLP environment. GATE is employed for this research to build the application that processes un-annotated Quranic text corpus. The developed application is evaluated using the well known IE evaluation metrics precision and recall. By comparing the system's automatically annotated text with a gold standard (i.e. Quran text). The system proved to be efficient by achieving 100% precision and recall of the implemented Tajweed rules.