25 September 2009

The mighty PDF

Latest PDF File IconImage via Wikipedia
PDF documents are a perfect export target in many applications because they're portable and look the same everywhere. So perfect in fact that there are virtual printers (like PDFCreator for Windows and integrated ones in Linux/*BSDs) which transform any printable file to a PDF.

But they are difficult when you want to change them or combine them.

I've had a folder sitting on my otherwise unburdened desktop for a year. It contained a bunch of scanned images which I wanted to aggregate into a nice and shiny single PDF file. Yeah. My google-fu had some slow reflexes this time and the first tries with FPDF, a PHP module, were unsuccessful. The images were in the correct format their manual demanded but I could only helplessly bang my head away as the scripts failed with errors of unsupported formats.

Sometimes my determination falters at moments like this and I put off trying to resolve the problem. But I had to push!

My google-fu got some well-deserved rest and found what I was looking for - a solution using GhostScript. By the way - this solution works for me on Ubuntu Linux but elsewhere YMMV.

Here's a script I ended up using (warning: It overwrites the original images.):

#!/bin/bash
for filename in *
do
    file -bi "$filename" | grep -q "image"
    if [ $? -eq 0 ]
    then
        echo "  Converting $filename"
        #Reduce image size by half 
        convert $filename -resize 50% $filename
        #Convert image to single-page PDF 
        convert $filename ${filename%.*}.pdf
    fi
done
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=combined.pdf -dBATCH *.pdf
The bash script relies on ImageMagick (the convert command) and GhostScript (the gs command). Best thing is to copy it to a file in the folder with the images destined to comprise the PDF, marking the file as executable and running it! The final PDF will contain all the single-paged PDFs sorted in lexical order.

For the intricacies of GhostScript seen above I will let the devil speak for itself:

$ gs -h
GPL Ghostscript 8.64 (2009-02-03)
Copyright (C) 2009 Artifex Software, Inc.  All rights reserved.
Usage: gs [switches] [file1.ps file2.ps ...]
Most frequently used switches: (you can use # in place of =)
 -dNOPAUSE           no pause after page   | -q       `quiet', fewer messages
 -g<width>x<height>  page size in pixels   | -r<res>  pixels/inch resolution
 -sDEVICE=<devname>  select device         | -dBATCH  exit after last file
 -sOutputFile=<file> select output file: - for stdout, |command for pipe,
                                         embed %d or %ld for page #
Input formats: PostScript PostScriptLevel1 PostScriptLevel2 PostScriptLevel3 PDF

Reblog this post [with Zemanta]

No comments:

Post a Comment

Yin & Yang!