Docento.app Logo
Docento.app
Code editor on a laptop screen
All Posts

How to Convert HTML to PDF: From One-Liners to Production Pipelines

April 28, 2026·7 min read

Generating a PDF from HTML is one of the most common server-side document tasks in modern applications. Invoices, receipts, contracts, reports, certificates, anything templated and then printed almost always starts as HTML and ends as PDF. This guide walks through the tools that get you there, from a single command-line incantation to a production-grade pipeline.

Why HTML is a great PDF source

HTML has a long head start on PDF as a content format:

  • Templating. Every modern framework, Jinja, Handlebars, React, Vue, produces HTML easily.
  • Styling. CSS handles fonts, colors, layout, and print-specific rules (@media print, @page).
  • Testing. You can render the HTML in a browser and iterate visually before generating the PDF.
  • Reusability. The same HTML can be served as a web page and converted to PDF.
  • Tooling. Vast ecosystem of libraries, command-line tools, and SaaS services.

A typical pattern: build a templated HTML page that renders nicely in the browser, then convert it to PDF for download.

Tools that produce PDFs from HTML

The market has consolidated around a handful of strong options.

Chromium / Headless Chrome. The dominant choice. Modern Chromium has a built-in "Print to PDF" engine that handles full CSS, modern HTML, web fonts, and JavaScript-rendered content. Several wrappers expose this:

  • Puppeteer (Node.js), first-class headless Chromium driver. page.pdf({ path: 'output.pdf', format: 'A4' }).
  • Playwright (multi-language), newer, multi-browser, similar API.
  • chrome --headless --print-to-pdf=out.pdf https://example.com, direct CLI.

This is the default recommendation for most new projects.

wkhtmltopdf. Older standalone tool based on an embedded WebKit. Still used widely, especially in legacy systems. CLI: wkhtmltopdf input.html output.pdf. Renders most CSS but lags behind Chrome for modern layouts (CSS Grid is partial, some Flexbox edge cases). Active development slowed years ago; consider it for stable legacy and Chromium for new work.

WeasyPrint (Python). Pure-Python HTML/CSS rendering engine. weasyprint input.html output.pdf. Strong CSS support including paged media, footnotes, page counters. Excellent for highly-styled reports and books. Slower than Chromium for huge documents.

Prince (commercial). Long-running specialized HTML-to-PDF engine. Best-in-class for CSS paged media (running headers, footnotes, complex pagination). Pricey but if you produce books or formal reports from HTML, worth evaluating.

PrinceXML's open-source-style relative vivliostyle, another paged-media-focused renderer, particularly common in book publishing pipelines.

LaTeX bypass. For some documents, HTML → Markdown → LaTeX → PDF via Pandoc produces nicer-looking PDFs than direct HTML conversion. Worth considering for highly typographic outputs. See how to convert a PDF to LaTeX for the reverse direction.

A minimal Node.js example

The smallest useful Puppeteer-based converter:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setContent('<h1>Invoice 1042</h1><p>Total: $250</p>');
  await page.pdf({ path: 'invoice.pdf', format: 'A4', printBackground: true });
  await browser.close();
})();

A real invoice generator template-fills the HTML before setContent. Background printing matters: by default, Chromium strips background colors and images on print; printBackground: true keeps them.

Print-specific CSS

The trick to good PDFs from HTML is print-specific CSS. The same page that looks great in the browser may not look great as a PDF unless you tell the renderer how to handle page boundaries.

@page {
  size: A4;
  margin: 20mm;
}

@media print {
  body { font-size: 11pt; }
  h1 { page-break-after: avoid; }
  table { page-break-inside: avoid; }
  .pagebreak { page-break-before: always; }
  thead { display: table-header-group; }
  tfoot { display: table-footer-group; }
}

This handles:

  • Page size and margins
  • Avoiding awkward page breaks in the middle of a heading or table
  • Forcing a new page where you want one
  • Repeating table headers across page breaks (one of the most useful tricks)

For headers, footers, and page numbers, modern CSS Paged Media (@page margins with running content) is supported by WeasyPrint, Prince, and partially by Chromium. For Chromium specifically, the displayHeaderFooter Puppeteer option plus headerTemplate / footerTemplate gives you full control.

Web fonts and PDF

Custom fonts work fine in HTML-to-PDF conversion as long as the renderer can fetch them. A few details:

  • Self-host the fonts. External font services (Google Fonts) sometimes block headless requests or have rate limits.
  • @font-face declarations. Use them in your CSS; the renderer fetches and embeds the font.
  • Wait for fonts before printing. In Puppeteer: await page.evaluateHandle('document.fonts.ready') before calling page.pdf(). Skipping this is the most common cause of "fonts swap mid-render" bugs.

JavaScript-rendered content

A page that builds its content with React, Vue, or similar needs to finish rendering before PDF generation. With Puppeteer:

await page.goto('https://example.com/invoice/1042', { waitUntil: 'networkidle0' });
await page.pdf({ ... });

networkidle0 waits until no network requests for 500ms, a reliable signal that client-side rendering has settled.

For very dynamic pages, also wait for a specific element:

await page.waitForSelector('#invoice-total');

Production deployment patterns

For a real production pipeline:

  1. Build the HTML server-side. Templated rendering with your usual framework.
  2. Serve it on an internal URL that the PDF generator can reach.
  3. Run a headless Chromium instance as a service (Docker container, AWS Lambda layer, dedicated worker process).
  4. Generate the PDF via Puppeteer/Playwright, save to S3 (or similar) and return the URL.
  5. Stream or download the PDF to the user.

This separates rendering from delivery and lets you scale the PDF generator independently.

For cloud-native deployment, several SaaS services do this for you:

  • DocRaptor (uses Prince under the hood)
  • PDFCrowd
  • HTML/CSS to PDF API (DocSpace, PDFShift, others)

These are convenient for low-volume or simple needs. For high-volume or sensitive content, self-host.

Performance considerations

Chromium is heavy. A few tactics:

  • Reuse the browser instance. Launching Chromium takes 1-2 seconds. Keep a long-running instance and pool pages.
  • Disable images you do not need. Block image requests at the network level if the PDF does not require them.
  • Set a reasonable timeout. A bad PDF generation should fail in seconds, not minutes.
  • Limit concurrent renders. Each headless render uses 100-300 MB of RAM. Cap concurrency so you do not OOM.

For very high-volume needs (millions of PDFs per day), look at PDFKit (Node.js) and PDFlib (multi-language) which build PDFs directly from object models without going through a browser engine. Much faster, much less flexible.

Common gotchas

Headers and footers do not appear. The default Chromium printing has no headers/footers. Pass displayHeaderFooter: true plus template HTML.

Page numbers in CSS do not work. counter(page) works in WeasyPrint and Prince but Chromium has limited support. Use Puppeteer's footerTemplate instead.

Tables breaking across pages awkwardly. Apply page-break-inside: avoid to rows or wrap small tables in containers with that rule.

Images not loading. Check the URL is reachable from the server doing the rendering. Local file paths often need explicit file:// prefixing.

SVG renders blurry in the PDF. Increase the SVG's intrinsic size in the HTML, not the rendered size, Chromium rasterizes based on intrinsic dimensions.

Emoji not rendering. Headless Chrome may lack emoji fonts. Install noto-emoji on Linux or use an inline emoji font like Twemoji.

Post-processing the PDF

After generating, you may want to:

All of these can run in the same pipeline, often using libraries like pikepdf or qpdf. See qpdf introduction.

Takeaway

Converting HTML to PDF is one of the most reliable, well-supported document workflows in modern stacks. Headless Chromium (via Puppeteer or Playwright) is the right default for most projects. WeasyPrint and Prince shine for highly-typographic or paged-media outputs. Write print-specific CSS, wait for fonts and dynamic content to settle, and pool browser instances for performance. For tools that polish the result, adding bookmarks, merging, or signing the generated PDF, Docento.app handles those steps without installing any tooling.

Related Posts