Small Wiredyne Logo

PDF Metadata Strip

PDF files nearly always contain hidden information. This information is called "metadata" because it is typically _about_ the document in which it is embedded and not properly part of the document itself. The metadata is put there by the program that created the PDF.

The exact type of metadata will depend on the program. It will typically include the name of the creator, the time of creation, and the date of the last modification. Information such as file name paths, the username of the creator, and the type of operating system used may also be included. If the operating system is left out, it may often be inferred from the creating program. These programs rarely miss the opportunity to advertise and promote themselves.

The metadata has little legitimate use and may be useful to attackers and other pests. The program "pdf-metadata-strip" removes the most common form of metadata from PDF files. The information transmitted should then only include that which is directly visible to the reader and creator.

pdf-metadata-strip relies on the pyPdf parsing library written by Mathieu Fenniak. (See http://pybrary.net/pyPdf/.) This rather nice library does all the work. pdf-metadata-strip is just a thin wrapper which reads in the PDF pages and then writes out new ones leaving the metadata behind.

The current release is pdf-metadata-strip-0.1.tar.gz.

Home     Software     Downloads     About