How to Recover a Corrupted or Broken PDF

A PDF that won't open is one of those problems that causes disproportionate panic. The contract you signed, the report due in an hour, the photos from a trip, and the file is "damaged or could not be repaired." Before paying for a recovery service, try the standard repair sequence. Most "broken" PDFs aren't really broken; they're just inconvenient.

Why PDFs break

PDFs are surprisingly robust, but they have failure modes:

Truncated downloads: a network blip leaves the file shorter than it should be.
Interrupted saves: a crash mid-save leaves an incomplete file on disk.
Cloud sync conflicts: two devices wrote to the same file at the same time.
Storage corruption: bad sectors, failing SD cards, USB sticks unplugged before flush.
Wrong file type: the file says .pdf but is actually empty or HTML (common when something downloaded an error page instead of the real PDF).
Encryption mismatch: the file is encrypted in a way the reader doesn't support.
Tool bugs: rare, but some PDF generators produce technically broken files that strict readers reject.

Step 1: Diagnose first

Before reaching for recovery tools, gather information:

File size: 0 bytes means the file is empty, recovery isn't possible from the file itself, only from a backup. A few hundred bytes usually means a truncated download.
First few bytes: real PDFs start with %PDF-. If the first line is <html>, you downloaded an error page, not a PDF.
Last few bytes: real PDFs end with %%EOF. Missing EOF means the file is truncated.
pdfinfo file.pdf: prints metadata if the file is at least basically valid; errors loudly if not.

A 30-second diagnosis often points directly at the fix.

Step 2: Try a different reader

PDFs that fail in Acrobat sometimes open fine in Edge, Chrome, Preview, or a browser-based reader. If the file opens anywhere, your problem is "this reader doesn't like this file," not "the file is broken."

Once you have it open in some reader, the easiest "repair" is to print to PDF, generates a fresh, clean copy that any reader can handle.

Step 3: Re-download if you can

If the PDF came from a website or email:

Re-download. Truncated downloads are far more common than people think, a flaky connection, a server hiccup, a mobile network drop, or a cancelled background download can all produce a partial file.
If the source is a webpage, look for the "view" link and re-save with Ctrl-S instead of "open with default app."
Compare file sizes if you can, if your copy is smaller than the source, it's truncated.

Step 4: Run a repair tool

Several free tools attempt repair:

qpdf: qpdf --check input.pdf reports problems; qpdf input.pdf output.pdf rewrites the file, often fixing minor issues silently.
mutool: mutool clean input.pdf output.pdf rebuilds the file, dropping unrecoverable parts.
Ghostscript: gs -o output.pdf -sDEVICE=pdfwrite input.pdf rewrites and often repairs.

Try each in turn. They use different parsers and different repair strategies; one may succeed where another fails.

Step 5: Browser-based repair

Browser tools that include a "repair" function run a similar sequence in WebAssembly. Docento.app attempts repair in the browser without uploading the file. Useful when you don't want to install command-line tools or when the file might contain sensitive content.

Step 6: Extract what you can

If the file is too damaged to fully repair, you may still be able to recover content:

Text extraction: pdftotext input.pdf - may produce text from a partially-damaged file.
Image extraction: pdfimages input.pdf img- extracts embedded images, even when the page structure is broken.
Object dump: qpdf --qdf input.pdf output.pdf produces a more readable version of the PDF's structure.

Even when the result isn't a usable PDF, the recovered content may be enough to rebuild manually.

Step 7: Check backups

This is what backups exist for:

macOS Time Machine: Cmd-Click the file in Finder → "Restore previous version."
Windows File History: right-click → "Restore previous versions."
Cloud storage version history: Dropbox, Google Drive, OneDrive, iCloud all keep recent versions of files; check the web interface.
Version control: if the PDF was tracked in git, git log shows past versions.
Email: if you ever sent the PDF, the version in your sent folder is intact.

Often the fastest recovery is just finding the previous version somewhere.

Step 8: Recover from the source

If the PDF was generated from another file, regenerate:

Word document → re-export to PDF.
LaTeX → recompile.
Web page → reprint to PDF.
Database report → re-run the report.

This is the cleanest "recovery", the new PDF is fresh and never had the problem.

When recovery isn't possible

Some failures are unrecoverable:

The file is genuinely 0 bytes.
The storage medium is dead and there's no backup.
The file was encrypted and the password is lost, modern AES-256 encryption can't be brute-forced practically.
The original was deleted from cloud storage long enough ago that version history expired.

In these cases, focus on what you can rebuild: contact the original author, check your sent folders, or reconstruct the content from notes.

Preventing the next time

A few habits avoid most "broken PDF" problems:

Pause before closing the lid of your laptop while a save is in progress.
Keep cloud sync running for important files. Free tier sync (Dropbox, Drive, OneDrive) gets you version history with no extra effort.
Don't edit PDFs on a thumb drive, slow flush and accidental ejection cause corruption.
Verify downloads by opening once before relying on them.
For critical documents, store an extra copy in a different location.

Conclusion

Most "broken" PDFs aren't broken, they're truncated, mis-saved, or rejected by a strict reader. Try a different reader, re-download if possible, run qpdf or mutool clean, and check version history. Docento.app supports browser-based repair without uploads. If recovery fails, focus on the source: regenerate, restore from backup, or reconstruct. For more on the surrounding workflow, see why won't my PDF open and troubleshooting PDF fonts.