Security and privacy issues in the Portable Document Format

The Portable Document Format (PDF) was developed by Adobe in the early nineties and today it is the de-facto standard for electronic document exchange. It allows reliable reproductions of published materials on any platform and it is used by many governmental and educational institutions, as well as companies and individuals. PDF documents are also credited with being more secure than other document formats such as Microsoft Compound Document File Format or Rich Text Format. This paper investigates the Portable Document Format and shows that it is not immune from some privacy related issues that affect other popular document formats. From a PDF document, it is possible to retrieve any text or object previously deleted or modified, extract user information and perform some actions that may be used to violate user privacy. There are several applications of such an issue. One of them is relevant to the scientific community and it pertains to the ability to overcome the blind review process of a paper, revealing information related to the anonymous referee (e.g., the IP address of the referee).

[1]  Richard T. Snodgrass,et al.  Editorial: Single- versus double-blind reviewing , 2007, TODS.

[2]  Robert P. Futrelle,et al.  Extraction,layout analysis and classification of diagrams in PDF documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Xueqi Cheng,et al.  Data Hiding in a Kind of PDF Texts for Secret Communication , 2007, Int. J. Netw. Secur..

[4]  David F. Brailsford,et al.  Extracting reusable document components for variable data printing , 2007, DocEng '07.

[5]  Luca Aceto,et al.  Peer Review Process , 2020, Journal of Cultural Management and Cultural Policy / Zeitschrift für Kulturmanagement und Kulturpolitik.

[6]  Kathryn S. McKinley Improving publication quality by reducing bias with double-blind reviewing and author response , 2008, SIGP.

[7]  Simon D. Byers Information leakage caused by hidden data in published documents , 2004, IEEE Security & Privacy Magazine.

[8]  Jian Fan,et al.  Layout and Content Extraction for PDF Documents , 2004, Document Analysis Systems.

[9]  E. K. Gannett,et al.  THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS , 1965 .

[10]  James C. King A format design case study: PDF , 2004, HYPERTEXT '04.

[11]  Claudio Soriente,et al.  Taking advantages of a disadvantage: Digital forensics and steganography using document metadata , 2007, J. Syst. Softw..