qpdf Introduction: The PDF Structural Surgeon

qpdf is the structural tool of the PDF CLI toolchain. While Ghostscript renders and compresses and pdftk is the friendly verb-based utility, qpdf operates at the level of PDF objects: splitting, merging, encrypting, linearizing, repairing, transforming. If you have ever needed precise control over a PDF's internal structure, qpdf is the right tool. This guide is an introduction.

What qpdf is

qpdf, by Jay Berkenbilt, is a free open-source command-line PDF transformation tool. Unlike Ghostscript (which interprets and re-renders) or pdftk (which uses iText), qpdf works directly on the PDF object model. This makes it fast and lossless, operations preserve the original content exactly, only rearranging the structure.

What qpdf does well:

Split and merge PDFs
Encrypt and decrypt
Linearize for fast web view
Repair corrupted PDFs
Object-level inspection and modification
Conversion between PDF versions
JSON dump of PDF structure
Various structural transformations

What it does not do:

Re-render or rasterize content
Compress images (no image-level operations)
OCR scanned PDFs
Edit text or form fields

For most structural needs, qpdf is the right tool.

Installing qpdf

Debian / Ubuntu:

sudo apt install qpdf

Fedora:

sudo dnf install qpdf

macOS:

brew install qpdf

Windows:

Download from the qpdf releases page on GitHub. Or use Chocolatey: choco install qpdf.

After installation, the qpdf command is available.

Basic command structure

qpdf [options] input.pdf [output.pdf]

Many operations write to a new file; some operate on the file in place. The basic pattern:

qpdf input.pdf output.pdf

This produces a structural copy. With no operations specified, qpdf re-emits the file with a clean internal structure (often slightly smaller than the input).

Common operations

Split / extract pages:

qpdf input.pdf --pages input.pdf 1-5 -- output.pdf

The --pages syntax names the source file and a page range. The -- ends the page selection. See how to extract pages from a PDF.

Multiple sources:

qpdf --empty --pages file1.pdf 1-3 file2.pdf 1 file1.pdf 4-end -- combined.pdf

Start with an empty PDF and assemble pages from multiple sources.

Merge:

qpdf --empty --pages file1.pdf file2.pdf file3.pdf -- combined.pdf

Same as merging multiple PDFs. See how to combine PDF files.

Burst into individual pages:

qpdf --split-pages input.pdf out-%d.pdf

Produces out-1.pdf, out-2.pdf, etc.

Delete pages:

qpdf --empty --pages input.pdf 1-3,5-end -- without-page-4.pdf

See how to delete pages from a PDF.

Rotate pages:

qpdf --rotate=+90:5 input.pdf rotated.pdf

Rotates page 5 by +90°. Use --rotate=+180:1-z to rotate all pages 180°.

See how to rotate a PDF page.

Encrypt:

qpdf --encrypt user_pw owner_pw 256 -- input.pdf encrypted.pdf

The numbers (256, 128) specify AES key length. Add --print=none --modify=none etc. for permission restrictions. See PDF permissions explained and AES-128 vs AES-256 PDF encryption.

Decrypt:

qpdf --decrypt --password=user_pw input.pdf decrypted.pdf

Removes encryption (requires the password). See how to remove a password from a PDF.

Linearize:

qpdf --linearize input.pdf linearized.pdf

Reorganizes for fast web view. See linearized PDF and fast web view.

Repair damaged PDFs:

qpdf input.pdf repaired.pdf

qpdf's basic copy operation rebuilds the file's cross-reference table, often fixing minor corruption. For damaged-but-readable PDFs, this works wonders. See how to recover a corrupted PDF.

For more aggressive repair:

qpdf --suppress-recovery=false damaged.pdf fixed.pdf

Check a PDF:

qpdf --check input.pdf

Reports structural integrity, encryption, linearization, and any warnings.

Inspect structure:

qpdf --json input.pdf

Dumps the entire PDF structure as JSON. Useful for programmatic inspection or debugging.

Convert between PDF versions:

qpdf --object-streams=disable input.pdf v1-4.pdf      # Older
qpdf --object-streams=generate input.pdf v1-5+.pdf    # Modern

Optimize the file structure:

qpdf --object-streams=generate --linearize input.pdf optimized.pdf

This generates an object-stream-compressed and linearized file, typically smaller and faster to load than the original.

Advanced operations

Replace pages from another file:

qpdf input.pdf --replace-pages source.pdf 1-3 1 -- output.pdf

Page 1 of input.pdf is replaced by pages 1-3 of source.pdf.

Add an overlay:

qpdf input.pdf --overlay overlay.pdf -- with-overlay.pdf

Adds an overlay file's pages on top of the input's pages.

Add an underlay:

qpdf input.pdf --underlay underlay.pdf -- with-underlay.pdf

Adds an underlay file's pages under the input's pages. Useful for adding backgrounds or watermarks.

Show pages:

qpdf --show-pages input.pdf

Lists each page with its object number and content.

Show encryption:

qpdf --show-encryption input.pdf

Reports the encryption parameters: algorithm, key length, permissions.

Working with metadata

qpdf does not directly edit metadata strings, but --copy-encryption and similar operations preserve metadata. For metadata edits, combine with ExifTool. See how to edit PDF metadata.

Batch operations

qpdf is fast and scriptable:

# Linearize all PDFs in a folder
for f in *.pdf; do
  qpdf --linearize "$f" "linearized/$f"
done

# Encrypt all PDFs with the same password
for f in *.pdf; do
  qpdf --encrypt user_pw owner_pw 256 -- "$f" "encrypted/$f"
done

# Repair all PDFs
for f in *.pdf; do
  qpdf "$f" "repaired/$f"
done

For parallel processing:

ls *.pdf | parallel "qpdf --linearize {} linearized/{}"

qpdf vs alternatives

qpdf shines for:

Structural operations. Split, merge, encrypt, linearize, repair.
Lossless transformations. No re-rendering; content preserved exactly.
Speed. Fast on large PDFs.
Repair. Often the first tool to try on damaged PDFs.

Use other tools for:

Compression and rendering, Ghostscript. See Ghostscript introduction.
Forms and stamps, pdftk. See pdftk introduction.
Text extraction, poppler-utils' pdftotext.
Image-level operations, ImageMagick + Ghostscript.
Programmatic deep manipulation, pikepdf (Python; same underlying engine).

A typical pipeline uses qpdf for structure, Ghostscript for compression, pdftk for forms, and pikepdf for custom Python work.

pikepdf: qpdf for Python

pikepdf is a Python library built on the same qpdf engine. It exposes a Pythonic API:

import pikepdf

with pikepdf.open("input.pdf") as pdf:
    # Inspect or modify the object model
    pdf.save("output.pdf", linearize=True)

For programmatic PDF manipulation in Python, pikepdf is the go-to. Many of the same operations available on the qpdf command line are accessible programmatically.

Common gotchas

-- separator. Many qpdf options require -- to separate arguments. Always include it where needed.

Page selection syntax. Uses ranges like 1-5, lists like 1,3,5, and special values like z (last page) and r3-r1 (last three pages). Use qpdf --help=pages to see the full syntax.

Encryption rounds. Encrypting with 256 defaults to AES-256 with revision 6. Some readers may not support revision 6; use --encrypt user_pw owner_pw 256 --force-V5 or similar to control. See AES-128 vs AES-256 PDF encryption.

Linearization fails silently. Some PDFs cannot be linearized due to structural reasons. The command may succeed but the result is not actually linearized. Check with --check.

Signature invalidation. Any qpdf transformation breaks digital signatures. Operate before signing, not after.

Memory on huge PDFs. qpdf is fast but loads the file into memory. For files larger than several GB, performance suffers.

Image extraction. qpdf cannot extract embedded images directly. Use Ghostscript or pdfimages (poppler-utils) for that.

OCR. qpdf does not OCR. Use OCRmyPDF or ABBYY for text recognition. See PDF OCR explained.

Practical recipe

For a typical qpdf workflow:

Inspect: qpdf --check input.pdf
Operate: qpdf --linearize input.pdf output.pdf
Verify: qpdf --check output.pdf

For batch jobs, wrap in a shell loop.

Takeaway

qpdf is the structural surgeon of the PDF toolchain, fast, lossless, and precise. For split, merge, encrypt, decrypt, linearize, repair, and inspect operations, it is hard to beat. It does not render, compress images, or OCR; those are jobs for other tools. Combined with Ghostscript, pdftk, and (in Python) pikepdf, qpdf forms the backbone of professional PDF processing pipelines. For browser-based one-off operations, Docento.app covers many similar tasks without installation. For related tools, see Ghostscript introduction, pdftk introduction, poppler-utils introduction, and MuPDF introduction.