Merging and Splitting Documents on Linux
Split, merge, slice, join, these knives can do it all!✍️ Jacob Mulquin
As part of my job, I deal with various documents sent by people using all types of different devices and varying levels of skillset. This can lead to some interesting situations where someone can send a document in as photos of printouts, multiple scanned pages, screenshots, a document that has been printed, then scanned to email. You get the idea... There's a lot of variation.
At work we use Windows, but I find the tools on Windows lacking for splitting and merging. You usually have to load these bulky GUIs which lock out features behind paywalls, no thanks! (I'm aware that the following tools are open source and available on Windows, but let's not let facts get in the way of some Linux evangelism)
Thanks to the wonders of open-source and many dedicated developers, there are numerous tools at your disposal if you want to merge or split documents. Thankyou community!
The first option is the
convert program, part of ImageMagick. I use this one frequently if I have numerous JPG images that need to be turned into a single PDF. You can turn down the quality significantly if someone has sent through absurd 9MB images of each page.
convert -quality 100 -rotate 0 *.jpg output.pdf
I find that using
convert when the input file is PDF can sometime lead to bad image quality. In cases where
convert doesn't do the job, the
gs (Ghostscript) program saves the day:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -dDetectDuplicateImages -dCompressFonts=true -r150 -sOutputFile=gs.pdf *.pdf
pdfunite, one I haven't used much but I have used the excellent
pdftotext program in the past.
pdfunite *.pdf output.pdf
Splitting a document
convert input.pdf[1-2] output.pdf
gs -dNOPAUSE -dQUIET -dBATCH -sOutputFile="output.pdf" -dFirstPage=1 -dLastPage=2 -sDEVICE=pdfwrite "input.pdf"
pdfseparate input.pdf output-%d.pdf
I threw together a script to help me split PDFs using a pattern, very originally named
With this script you can pass it a pattern, e.g. Say I want pages 1-2 as a document, then 3 standalone, 4 standalone, and finally 5-7 as a document. I would pass it "1-2, 3, 4, 5-7". Much like you can with GUI programs.
If you know of a program that can do this please let me know.
# ./pdfsplit.sh input.pdf pattern [output-prefix] # Written by Jacob Mulquin (https://mulquin.com), 2022 INPUT_FILE="$1" PATTERN="$(echo -e "$2" | tr -d '[:space:]')" if [[ -n "$3" ]]; then OUTPUT_PREFIX="$3" else OUTPUT_PREFIX=$INPUT_FILE fi for i in $(echo $PATTERN | tr "," "\n") do FIRST_PAGE=$i LAST_PAGE=$i if [[ "$i" == *"-"* ]]; then FIRST_PAGE=$(echo "$i" | cut -d- -f1) LAST_PAGE=$(echo "$i" | cut -d- -f2) fi gs -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$OUTPUT_PREFIX-$FIRST_PAGE.pdf" -dFirstPage=$FIRST_PAGE -dLastPage=$LAST_PAGE -sDEVICE=pdfwrite "$INPUT_FILE" done