Reverse-Transliteration of Hebrew script for Entity Disambiguation

JudaicaLink is a novel domain-specific knowledge base for Jewish culture, history, and studies. JudaicaLink is built by extracting structured, multilingual knowledge from different sources and it is mainly used for contextualization and entity linking. One of the main challenges in the process of aggregating Jewish digital resources is the use of the Hebrew script. The proof of materials in German central cataloging systems is based on the conversion of the original script of the publication into the Latin script, known as Romanization. Many of our datasets, especially those from library catalogs, contain Hebrew authors' names and titles which are only in Latin script without their Hebrew script. Therefore, it is not possible to identify them in and link them to other corresponding Hebrew resources. To overcome this problem, we designed a reverse-transliteration model which reconstructs the Hebrew script from the Romanization and consequently makes the entities more accessible.