A PDF Text Extractor Based on PDF-Renderer

— In this paper we propose a new solution for PDF (Portable Document File) text extraction. Firstly, we made a comparison of some PDF text extractor tools. We started with a brief presentation of some available tools that have been used in some research works. Secondly, we analyzed the performance of ICEpdf and PDFBox (Java Open Source tools). Our experimental results showed that none of the tools strictly subsumes another. Both of them have a clear font and overlapping problem. Thus, to overcome these issues we proposed a new text extractor engine project based on Java PDF-Renderer, whish shows a good rendering compared to the previous ones. Our result can be helpful for researchers who need such a tool, to understand the characteristics of each one, and to choose a suitable tool for their works.