Docento.app Logo
Docento.app
All Posts

How to Strip Metadata From a PDF for Privacy

April 24, 2026·6 min read

Every PDF you create silently records information about you, your software, and your workflow. Most of it is harmless. Some of it is embarrassing. A bit of it is genuinely sensitive. Before sending a PDF to anyone outside your immediate circle, take 30 seconds to clean its metadata. The risk is low but the cost of cleanup is even lower.

What metadata PDFs actually contain

A PDF's metadata can include:

  • Author — usually the username on the machine that created the file.
  • Title — sometimes useful, sometimes embarrassing (a draft filename like "interview-rejects-2026.pdf").
  • Creator and Producer — the software used to create and convert the file.
  • Creation date and modification date — sometimes revealing about the timeline.
  • Keywords — author-supplied tags.
  • Subject — author-supplied summary.
  • Custom XMP metadata — anything the source application chose to include.
  • Tracked changes from the source if not properly cleared.
  • Form field history in some PDFs.

That's just the main metadata. PDFs can also contain:

  • Embedded images with their own metadata — a phone photo embedded in a PDF still carries GPS coordinates.
  • Document-level JavaScript that may reveal logic or comments.
  • Comments and review markup from the editing process.
  • Hidden layers with content that's invisible by default but extractable.

Why it matters

People have leaked, in public PDFs:

  • The internal codename of a project.
  • The original author of a document later attributed to someone else.
  • GPS locations of where photos in the document were taken.
  • Comments meant only for the review team.
  • Drafts of redacted text via insufficient redaction.
  • The full username and machine name of the person who created the file.

For most documents this is fine. For confidential, regulated, or politically sensitive material, it can be a serious leak.

Step 1: See what's there

Before stripping, look:

  • Acrobat Reader / Pro: File → Properties shows the standard fields. The "Custom" tab shows extras.
  • macOS Preview: Tools → Show Inspector shows metadata.
  • Browser tools: most have a properties panel.
  • Command line: pdfinfo file.pdf for the basics; exiftool file.pdf for everything including XMP.

exiftool is especially useful — it dumps every metadata field, including ones that GUIs hide.

Step 2: Strip the obvious metadata

Three common methods:

  • Browser tool: Docento.app lets you clear all metadata with a click and re-save, in the browser without uploads.
  • Acrobat Pro: Tools → Redact → Sanitise Document removes hidden information including metadata.
  • Command line: exiftool -all= file.pdf strips all metadata. qpdf --remove-metadata file.pdf clean.pdf removes the XMP block.

Test the result with pdfinfo or exiftool afterwards — some tools strip standard metadata but leave XMP.

Step 3: Strip embedded image metadata

Photos embedded in a PDF carry their own EXIF data — including GPS coordinates if they were taken on a phone. Standard PDF metadata stripping doesn't always catch this.

To clean embedded images:

  • Extract images: pdfimages -all file.pdf img-.
  • Strip metadata from each: exiftool -all= *.jpg.
  • Rebuild the PDF, or use a tool that handles this in one pass.

For documents with sensitive embedded photography (real-estate, journalism, anything taken at a location you'd rather not disclose), this step is essential.

Step 4: Remove tracked changes and comments

If you used Word's track changes or someone left review comments, those may persist into the PDF:

  • In Word, accept or reject all changes before exporting.
  • Clear comments before export.
  • In the PDF, remove all annotations: qpdf --clear-annotations (some forks) or strip via a redaction tool.
  • For review markup that should not appear in the final PDF, flatten the document — this makes annotations permanent or removes them, depending on the tool.

Step 5: Check for hidden layers

PDFs support "Optional Content Groups" — layers that can be hidden or shown. A document might have a "draft" layer hidden by default but still extractable.

Tools to check:

  • Acrobat: View → Show/Hide → Navigation Panes → Layers lists all layers.
  • mutool show file.pdf ocgs lists optional content groups.

Flatten or remove unwanted layers before sending.

Step 6: Sanitise vs strip

Many PDF tools have a "Sanitise" option that goes further than basic metadata stripping. Sanitisation typically:

  • Strips metadata.
  • Removes JavaScript.
  • Removes embedded files and attachments.
  • Removes form data.
  • Removes comments and tracked changes.
  • Removes signatures (which is a feature, not a bug — keep an unsanitised copy if you need the signature).

For maximum privacy, sanitise. Caveats:

  • It's destructive; keep the original.
  • It may remove things you didn't realise were in the file (which is often what you want).
  • Forms and signatures will be removed; don't sanitise documents where these matter.

Method: a privacy-focused PDF export

The cleanest approach is to never write the metadata in the first place. When exporting from Word:

  • Go to File → Info → Inspect Document → Inspect.
  • Select all categories.
  • Remove anything found.
  • Then export to PDF.

Word's document inspector catches most issues, including comments, tracked changes, hidden text, document properties, and embedded files. Use it before any external delivery.

For LibreOffice: File → Properties clear fields, then export to PDF.

For LaTeX: control metadata via \hypersetup{pdfauthor=, pdftitle=, ...} and avoid leaking author info via the \author{} command.

Verification

After stripping:

  • Re-run exiftool file.pdf and confirm the fields are clean.
  • Open in a fresh PDF reader and check the Properties dialog.
  • For doubly-sensitive documents, check with multiple tools — sometimes one tool sees fields another tool ignores.

When to do this

A practical rule:

  • For internal sharing: don't worry about it.
  • For external sharing with known recipients: a quick metadata strip.
  • For public posting: a full sanitise.
  • For sensitive material (anonymous tips, whistleblowing, public records, anything that might be analysed): full sanitise plus a trip through a "rebuild" pipeline (PDF → text/images → new PDF) which strips most everything that could survive a sanitise.

Conclusion

Metadata stripping is fast, free, and worth doing for any PDF that leaves your trusted circle. Use a browser tool, exiftool, or your PDF tool's "Sanitise" function. Verify with exiftool afterwards. Docento.app handles metadata stripping in the browser without uploads, with the file staying on your device. For broader privacy considerations, see privacy in browser PDF editing.

Related Posts