Open a PDF and find that the text is showing up as boxes, gibberish, or a font that looks completely wrong, and you've hit one of the more frustrating PDF problems. The good news: it's almost always fixable. The bad news: there are several different things that look identical on screen but need different fixes.
What you're actually seeing
The same "broken fonts" symptom comes from at least four different problems:
- Missing glyphs (boxes or empty squares): the font is there, but doesn't include the specific characters needed.
- Substitute font (looks similar but slightly off): the original font isn't embedded; the reader picked a substitute.
- Mojibake (random unicode characters or symbols): encoding mismatch between the PDF's character map and the font.
- Tofu (squares with hex codes inside): characters that exist in Unicode but aren't supported by any installed font.
Diagnosing which one you're hitting determines the fix.
Step 1: Identify the actual font
Open the PDF in a reader that shows font info:
- Acrobat Reader: File → Properties → Fonts lists every font, whether it's embedded, and which substitute is being used if not.
- Browser tools: most show font info in a properties panel.
- Command line:
pdfinfo -fonts file.pdflists every font in the document and its embedding status.
Three lines to read:
- Font name.
- "Embedded" or "Embedded subset": yes means the font travels with the file; no means the reader is using a system font.
- Type: Type1, TrueType, CIDType0, etc. Different types have different failure modes.
Step 2: If the font isn't embedded
The most common cause of "wrong-looking fonts." The PDF references a font like "Helvetica Neue Light" but doesn't include it. Your system doesn't have that font, so the reader falls back to a substitute.
Fixes:
- Install the missing font. If you have rights to the font, install it system-wide and reopen the PDF.
- Re-export from the source with "embed all fonts" enabled. This is the right long-term fix.
- Use a reader with better font substitution. Acrobat has the most sophisticated substitution engine; some open-source readers fall back more crudely.
Word, LibreOffice, and InDesign all have an "embed all fonts" toggle in their PDF export settings. Use it always for any document leaving your machine.
Step 3: If glyphs are missing (boxes)
The font is there but doesn't include the character. Common causes:
- The PDF uses a "subset" font. Embedding only the characters used in the document is normal and saves space. But if a tool re-edits the PDF and adds new characters, those characters are missing from the subset. Re-export with full embedding.
- The font genuinely doesn't have the character. A font tuned for English may not include Cyrillic or Greek characters. Switch fonts or add a fallback.
- Combining marks (accents, diacritics) sometimes render as separate boxes. Modern readers handle this correctly; older ones don't.
Step 4: If text is gibberish (mojibake)
Mojibake — text that should be "Hello" but appears as something like "誶è´∞" — usually means an encoding mismatch:
- The PDF's character map (CMap) is wrong. Re-export from the source.
- The font and the text were encoded in different systems. Common in older PDFs, especially from Asia or Eastern Europe.
- Copy-paste extraction looks wrong even though the page renders fine. The encoding is fine for visual rendering but broken for text extraction. Run OCR to produce a clean text layer.
Step 5: If the document looks fine but extraction is broken
A specific case worth highlighting: the page renders correctly, but text extraction or Ctrl-F search produces gibberish. This is the encoding-rendering mismatch — the PDF has its own internal mapping from glyph IDs to characters, and that mapping is broken even though the visuals are correct.
This is especially common in:
- PDFs generated by older typesetting systems (LaTeX with custom encodings, FrameMaker).
- PDFs from some non-Western publishers.
- PDFs that have been through multiple conversion pipelines.
The fix: OCR the document to produce a fresh text layer, even though it's "already digital." See making a PDF searchable.
Step 6: If specific characters consistently fail
If certain characters always render as boxes — for example, the euro sign, certain ligatures, or non-Latin characters — the problem is font support:
- Verify the font actually contains the character. Use FontForge or a font inspector.
- Switch to a more comprehensive font in the source (Noto fonts have very wide coverage).
- Add a fallback font chain in the source, so missing characters fall back to a font that has them.
Re-export and the new PDF should render correctly.
When the source is gone
The hardest case: you don't have the source document, and the PDF has missing fonts. Options:
- Install the missing fonts if you can identify and obtain them. The font name in the PDF properties is your starting point.
- Use a reader with strong substitution (Acrobat usually).
- Convert to outlines. Some tools (Inkscape, Acrobat Pro, Ghostscript with
-dEmbedAllFonts=true -dNOPLATFONTS) can convert text to vector paths. The text becomes uneditable but renders identically everywhere. - OCR and rebuild. Run OCR on the rendered text, then rebuild a new PDF from the OCR output.
None of these are ideal. The lesson: always embed fonts when you create a PDF.
Preventive habits
A short list to avoid this whole problem:
- Always embed all fonts in PDF exports. Always.
- Stick to widely-supported fonts for documents going to recipients you don't know — system fonts, Google Fonts, Adobe Fonts that ship with Acrobat.
- Test PDFs on a different machine before sending. Open on a phone if possible — phones often have different font sets and reveal substitution issues.
- For documents that need permanence (legal, archival), use PDF/A which mandates embedded fonts.
- For mathematical or scientific content, use a font like STIX or Cambria Math that has full math character coverage.
Reader-specific differences
Some font display problems are reader-specific:
- Edge and Chrome sometimes substitute fonts differently than Acrobat.
- Mobile readers have smaller font sets; substitution is more aggressive.
- PDF.js (Firefox) had historic issues with certain CFF fonts, mostly resolved.
- Older versions of any reader may lack support for newer font features.
If a PDF renders correctly in one reader but not another, the document is fine and the reader is the issue. Update or switch.
Conclusion
Font problems in PDFs almost always trace back to fonts that aren't embedded, broken character mappings, or readers with limited substitution. Diagnose with pdfinfo or the reader's properties dialog, fix at the source by embedding fonts, and OCR as a last resort when the source is gone. Docento.app handles font analysis and re-embedding in the browser without uploads. For related troubleshooting, see why won't my PDF open and recovering corrupted PDFs.