qpdf is the structural tool of the PDF CLI toolchain. While Ghostscript renders and compresses and pdftk is the friendly verb-based utility, qpdf operates at the level of PDF objects: splitting, merging, encrypting, linearizing, repairing, transforming. If you have ever needed precise control over a PDF's internal structure, qpdf is the right tool. This guide is an introduction.
What qpdf is
qpdf, by Jay Berkenbilt, is a free open-source command-line PDF transformation tool. Unlike Ghostscript (which interprets and re-renders) or pdftk (which uses iText), qpdf works directly on the PDF object model. This makes it fast and lossless, operations preserve the original content exactly, only rearranging the structure.
What qpdf does well:
- Split and merge PDFs
- Encrypt and decrypt
- Linearize for fast web view
- Repair corrupted PDFs
- Object-level inspection and modification
- Conversion between PDF versions
- JSON dump of PDF structure
- Various structural transformations
What it does not do:
- Re-render or rasterize content
- Compress images (no image-level operations)
- OCR scanned PDFs
- Edit text or form fields
For most structural needs, qpdf is the right tool.
Installing qpdf
Debian / Ubuntu:
sudo apt install qpdf
Fedora:
sudo dnf install qpdf
macOS:
brew install qpdf
Windows:
Download from the qpdf releases page on GitHub. Or use Chocolatey: choco install qpdf.
After installation, the qpdf command is available.
Basic command structure
qpdf [options] input.pdf [output.pdf]
Many operations write to a new file; some operate on the file in place. The basic pattern:
qpdf input.pdf output.pdf
This produces a structural copy. With no operations specified, qpdf re-emits the file with a clean internal structure (often slightly smaller than the input).
Common operations
Split / extract pages:
qpdf input.pdf --pages input.pdf 1-5 -- output.pdf
The --pages syntax names the source file and a page range. The -- ends the page selection. See how to extract pages from a PDF.
Multiple sources:
qpdf --empty --pages file1.pdf 1-3 file2.pdf 1 file1.pdf 4-end -- combined.pdf
Start with an empty PDF and assemble pages from multiple sources.
Merge:
qpdf --empty --pages file1.pdf file2.pdf file3.pdf -- combined.pdf
Same as merging multiple PDFs. See how to combine PDF files.
Burst into individual pages:
qpdf --split-pages input.pdf out-%d.pdf
Produces out-1.pdf, out-2.pdf, etc.
Delete pages:
qpdf --empty --pages input.pdf 1-3,5-end -- without-page-4.pdf
See how to delete pages from a PDF.
Rotate pages:
qpdf --rotate=+90:5 input.pdf rotated.pdf
Rotates page 5 by +90°. Use --rotate=+180:1-z to rotate all pages 180°.
Encrypt:
qpdf --encrypt user_pw owner_pw 256 -- input.pdf encrypted.pdf
The numbers (256, 128) specify AES key length. Add --print=none --modify=none etc. for permission restrictions. See PDF permissions explained and AES-128 vs AES-256 PDF encryption.
Decrypt:
qpdf --decrypt --password=user_pw input.pdf decrypted.pdf
Removes encryption (requires the password). See how to remove a password from a PDF.
Linearize:
qpdf --linearize input.pdf linearized.pdf
Reorganizes for fast web view. See linearized PDF and fast web view.
Repair damaged PDFs:
qpdf input.pdf repaired.pdf
qpdf's basic copy operation rebuilds the file's cross-reference table, often fixing minor corruption. For damaged-but-readable PDFs, this works wonders. See how to recover a corrupted PDF.
For more aggressive repair:
qpdf --suppress-recovery=false damaged.pdf fixed.pdf
Check a PDF:
qpdf --check input.pdf
Reports structural integrity, encryption, linearization, and any warnings.
Inspect structure:
qpdf --json input.pdf
Dumps the entire PDF structure as JSON. Useful for programmatic inspection or debugging.
Convert between PDF versions:
qpdf --object-streams=disable input.pdf v1-4.pdf # Older
qpdf --object-streams=generate input.pdf v1-5+.pdf # Modern
Optimize the file structure:
qpdf --object-streams=generate --linearize input.pdf optimized.pdf
This generates an object-stream-compressed and linearized file, typically smaller and faster to load than the original.
Advanced operations
Replace pages from another file:
qpdf input.pdf --replace-pages source.pdf 1-3 1 -- output.pdf
Page 1 of input.pdf is replaced by pages 1-3 of source.pdf.
Add an overlay:
qpdf input.pdf --overlay overlay.pdf -- with-overlay.pdf
Adds an overlay file's pages on top of the input's pages.
Add an underlay:
qpdf input.pdf --underlay underlay.pdf -- with-underlay.pdf
Adds an underlay file's pages under the input's pages. Useful for adding backgrounds or watermarks.
Show pages:
qpdf --show-pages input.pdf
Lists each page with its object number and content.
Show encryption:
qpdf --show-encryption input.pdf
Reports the encryption parameters: algorithm, key length, permissions.
Working with metadata
qpdf does not directly edit metadata strings, but --copy-encryption and similar operations preserve metadata. For metadata edits, combine with ExifTool. See how to edit PDF metadata.
Batch operations
qpdf is fast and scriptable:
# Linearize all PDFs in a folder
for f in *.pdf; do
qpdf --linearize "$f" "linearized/$f"
done
# Encrypt all PDFs with the same password
for f in *.pdf; do
qpdf --encrypt user_pw owner_pw 256 -- "$f" "encrypted/$f"
done
# Repair all PDFs
for f in *.pdf; do
qpdf "$f" "repaired/$f"
done
For parallel processing:
ls *.pdf | parallel "qpdf --linearize {} linearized/{}"
qpdf vs alternatives
qpdf shines for:
- Structural operations. Split, merge, encrypt, linearize, repair.
- Lossless transformations. No re-rendering; content preserved exactly.
- Speed. Fast on large PDFs.
- Repair. Often the first tool to try on damaged PDFs.
Use other tools for:
- Compression and rendering, Ghostscript. See Ghostscript introduction.
- Forms and stamps, pdftk. See pdftk introduction.
- Text extraction, poppler-utils' pdftotext.
- Image-level operations, ImageMagick + Ghostscript.
- Programmatic deep manipulation, pikepdf (Python; same underlying engine).
A typical pipeline uses qpdf for structure, Ghostscript for compression, pdftk for forms, and pikepdf for custom Python work.
pikepdf: qpdf for Python
pikepdf is a Python library built on the same qpdf engine. It exposes a Pythonic API:
import pikepdf
with pikepdf.open("input.pdf") as pdf:
# Inspect or modify the object model
pdf.save("output.pdf", linearize=True)
For programmatic PDF manipulation in Python, pikepdf is the go-to. Many of the same operations available on the qpdf command line are accessible programmatically.
Common gotchas
-- separator. Many qpdf options require -- to separate arguments. Always include it where needed.
Page selection syntax. Uses ranges like 1-5, lists like 1,3,5, and special values like z (last page) and r3-r1 (last three pages). Use qpdf --help=pages to see the full syntax.
Encryption rounds. Encrypting with 256 defaults to AES-256 with revision 6. Some readers may not support revision 6; use --encrypt user_pw owner_pw 256 --force-V5 or similar to control. See AES-128 vs AES-256 PDF encryption.
Linearization fails silently. Some PDFs cannot be linearized due to structural reasons. The command may succeed but the result is not actually linearized. Check with --check.
Signature invalidation. Any qpdf transformation breaks digital signatures. Operate before signing, not after.
Memory on huge PDFs. qpdf is fast but loads the file into memory. For files larger than several GB, performance suffers.
Image extraction. qpdf cannot extract embedded images directly. Use Ghostscript or pdfimages (poppler-utils) for that.
OCR. qpdf does not OCR. Use OCRmyPDF or ABBYY for text recognition. See PDF OCR explained.
Practical recipe
For a typical qpdf workflow:
- Inspect:
qpdf --check input.pdf - Operate:
qpdf --linearize input.pdf output.pdf - Verify:
qpdf --check output.pdf
For batch jobs, wrap in a shell loop.
Takeaway
qpdf is the structural surgeon of the PDF toolchain, fast, lossless, and precise. For split, merge, encrypt, decrypt, linearize, repair, and inspect operations, it is hard to beat. It does not render, compress images, or OCR; those are jobs for other tools. Combined with Ghostscript, pdftk, and (in Python) pikepdf, qpdf forms the backbone of professional PDF processing pipelines. For browser-based one-off operations, Docento.app covers many similar tasks without installation. For related tools, see Ghostscript introduction, pdftk introduction, poppler-utils introduction, and MuPDF introduction.