Skip to content

PDF Manipulation - Merging and Splitting

Library choice

A common modern option is pypdfpypdf (successor of PyPDF2).

Merge PDFs

pdf_merge.py
from pypdf import PdfMerger
 
merger = PdfMerger()
merger.append("a.pdf")
merger.append("b.pdf")
 
merger.write("merged.pdf")
merger.close()
pdf_merge.py
from pypdf import PdfMerger
 
merger = PdfMerger()
merger.append("a.pdf")
merger.append("b.pdf")
 
merger.write("merged.pdf")
merger.close()

Split a PDF

pdf_split.py
from pypdf import PdfReader, PdfWriter
 
reader = PdfReader("input.pdf")
 
for i, page in enumerate(reader.pages, start=1):
    writer = PdfWriter()
    writer.add_page(page)
    with open(f"page_{i}.pdf", "wb") as f:
        writer.write(f)
pdf_split.py
from pypdf import PdfReader, PdfWriter
 
reader = PdfReader("input.pdf")
 
for i, page in enumerate(reader.pages, start=1):
    writer = PdfWriter()
    writer.add_page(page)
    with open(f"page_{i}.pdf", "wb") as f:
        writer.write(f)

Notes

  • encrypted PDFs need extra handling
  • layout/text extraction is a different problem (next page)

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did