LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Blogs > Michael Uplawski
User Name
Password

Notices


Rate this Entry

Create Images from PDF and recreate PDF from Images...

Posted 04-19-2016 at 07:06 AM by Michael Uplawski
Updated 09-12-2017 at 12:33 PM by Michael Uplawski (a different title for the second script; wording improved)

You might think WTF. But that would be a waste of spontaneity.
Even if the thought has never hit you, this procedure has been useful many times.

Background

My own reasons to convert PDF-files to images, then back to PDF have been
  • PDFs containing unnecessarily huge images
  • PDFs with OpenType fonts which were not correctly embedded
  • PDF content that I did not want to see copied (not easily, at least)
  • PDFs which were originally made from images, that I wanted to modify

These four exemplary cases invoke two principal tasks:
  1. Modify PDF-content
  2. Remove all fonts from a PDF
| Beware that the procedure that I present does not impede that content be “stolen” from a PDF file.
| Even if there is no font-, and thus text, -remaining in the resulting pdf-file, graphics tools can easily extract whatever part of the document
| may appear useful in another context. As far as I know, there is no way to achieve this, other than by encrypting the PDF-file as a whole. File-encryption,
| as it is integrated in PDF-tools can only be a suggestion for the reader-application. Having the means to display a PDF-file, these programs can choose
| to not care about encryption...
My tools and the tools that are called by my tools

Okay, I have scripts. You do not need them, if you know the libtiff tools and ghostscript. Both are usually installed on a contemporary linux-system; In case of doubt make sure that you can execute
Code:
user@machine:~$ tiff2pdf
LIBTIFF, Version 4.0.6
Copyright (c) 1988-1996 Sam Leffler
Copyright (c) 1991-1996 Silicon Graphics, Inc.

usage:  tiff2pdf [options] input.tiff
options:
 -o: output to file name
 -j: compress with JPEG
 -z: compress with Zip/Deflate
 -q: compression quality
 -n: no compressed data passthrough
 -d: do not compress (decompress)
 -i: invert colors
 -u: set distance unit, 'i' for inch, 'm' for centimeter
 -x: set x resolution default in dots per unit
 -y: set y resolution default in dots per unit
 -w: width in units
 -l: length in units
 -r: 'd' for resolution default, 'o' for resolution override
 -p: paper size, eg "letter", "legal", "A4"
 -F: make the tiff fill the PDF page
 -f: set PDF "Fit Window" user preference
 -e: date, overrides image or current date/time default, YYYYMMDDHHMMSS
 -c: sets document creator, overrides image software default
 -a: sets document author, overrides image artist default
 -t: sets document title, overrides image document name default
 -s: sets document subject, overrides image image description default
 -k: sets document keywords
 -b: set PDF "Interpolate" user preference
 -h: usage
and

Code:
user@machine:~$ gs
GPL Ghostscript 9.19 (2016-03-23)
Copyright (C) 2016 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
GS>
Type quit to leave the Ghostscript interactive mode.

If you have tiff2pdf, you also have tiffcp and do not have to try out that command. But we will use it... or, my scripts will.

If you do not want to worry about all the options to tiffcp, tiff2pdf and ghostscript, how about a graphical user-interface for the task to create images from PDF-files and another one to create PDF again, from the resulting images? How about this screen shot

You need Yad to create the graphical user-interfaces. I give you my program-icon, if you like it, otherwise choose whatever you want to replace PDFImage in the scripts (must be a file from /usr/share/pixmaps, I guess).

And finally, the scripts:
Code:
#!/bin/bash
# pdf_to_image, prints PDF to one of the supported image formats.
# Needs Ghostscript and YAD.
# Calls Ghostscript to create image-files from PDF
#
# This script ©2016 Michael Uplawski <michael.uplawski@uplawski.eu>
# Use at your own risk, modify as you please
#
# Ghostscipt is ©2015 Artifex Software, Inc.  All rights reserved.

GS=`which gs`

# verification and error-handling: Ghostscript not found
if [ ! $GS ]
then
   yad --image "dialog-error" --title="Error" --window-icon="" --text="Ghostscript is needed but cannot be found! Make sure that it is installed!"
   exit 1
fi
DEF_FORMAT="TIF"
FORMATS="^TIF;JPEG;PGM;PNG;PCX;BMP"
SEP=";"
PAPER=a4

DONE=0
while [ "$DONE" -eq 0 ]
do

   # Create the YAD user-interface, retrieve values.
   VALUES=($(yad --title="pictures from PDF" --image="PDFImage" --window-icon="PDFImage" --form --separator=' ' --item-separator="$SEP" --field="PDF-file(s)":MFL ' ' --field="Output format":CB "$FORMATS" --field="Resolution":NUM "400;180..600;10" ))

   echo "VALUES is ${VALUES[@]}"
   # verification and error-handling
   if [ ! "${VALUES}" ]
   then
      exit 0
   fi

   if [ 1 -ge "${#VALUES[@]}" ]
   then
      yad --image "dialog-error" --window-icon="PDFImage" --title="No PDF files" --text="No files have been selected. Aborting."
      exit 2
   fi

   PDFS=($(echo ${VALUES[0]} | tr "$SEP" ' '))
   # echo "PDFS is ${PDFS} (${#PDFS[@]})"
   FORMAT=${VALUES[1]}
   RES=${VALUES[2]}

   # No picture-files (or just anything) found in the array.
   if [ 0 -eq "${#PDFS[@]}" ]
   then
      yad --image "dialog-error" --window-icon="PDFImage" --title="no pdf files" --text="No files have been selected. Aborting."
      exit 2
   fi

   case "$FORMAT" in
      TIF)
         TYPE=tif
         DEV=tiff24nc
         ;;
      JPEG)
         TYPE=jpg
         DEV=jpeg
         ;;
      PGM)
         TYPE=pgm
         DEV=pgm
         ;;
      PNG)
         TYPE=png
         DEV=png48
         ;;
      PCX)
         TYPE=pcx
         DEV=pcx24b
         ;;
      BMP)
         TYPE=bmp
         DEV=bmp256
         ;;
   esac
   result=""
   for inf in "${PDFS[@]}"
   do
      # echo "PDF: $inf"
      if [ ! -f "$inf" ]
      then
         yad --image "dialog-error" --window-icon="PDFImage" --title="unsuitable file" --text="$inf is not a valid PDF-file."
         continue
      fi
      DIR=`dirname "$inf"`
      # echo "DIR is $DIR"
      ext=${inf/*./}
      if [ "$DIR" == '' ]
      then
         DIR='.'
      fi
      OUTPUT="$DIR"/`basename "$inf" ."$ext"`_%04d."$TYPE"
      OP="$DIR"/`basename "$inf" ."$ext"`"*"."$TYPE"
      # echo "creating files $OUTPUT"   
      # /usr/bin/gs -SDEVICE="$DEV" -r"$RES" -sPAPERSIZE="$PAPER" -sOutputFile="$OUTPUT" -dNOPAUSE -dBATCH -- "$inf"
      result="$result\n---------\n$inf:\n---------\n"$("$GS" -SDEVICE="$DEV" -r"$RES"x"$RES" -sPAPERSIZE="$PAPER" -sOutputFile="$OUTPUT" -dNOPAUSE -- "$inf")
      result="$result\n"$(ls -gGh $OP | cut -d' ' -f3-13)
      DONE=1
   done

   if [ $DONE -gt 0 ]
   then
      # All done ... maybe
      echo -e "$result" | yad --image "dialog-information" --title="pdf2"$TYPE" Conversion done" --window-icon="PDFImage" --text-info --text="All done:\n" --width="600" --height="400"
   else
      yad --image "dialog-warning" --title="problem" --window-icon="PDFImage" --text="Verify your choices. Could not convert a single file!"

   fi
done
Code:
#!/bin/bash
# tiff_to_pdf - creates PDF ffrom Tiff(s).
# ©2015-2016 Michael Uplawski <michael.uplawski@uplawski.eu>
# Needs YAD, libtiff
# -----------> 3 Variables for you to set <-------------
AUTHOR="Der wo das PDF macht <pdf_macher@email_provider.tld>"
PAGE="A4"
READER="/usr/bin/acroread"
# <--------------- Nothing to do below this ----------->

ISEP=";"
FSEP="|"

values=($(yad --title="Tiff to PDF" --window-icon="PDFImage" --image="PDFImage" --form --separator="$FSEP" --item-separator="$ISEP" --field='Tiff-Files':MFL "" --field='resulting PDF-File':SFL "" --field='open PDF?':CHK TRUE --field="PDF open command" "$READER"))
echo "$values"
READER=$(echo $values | cut -d"$FSEP" -f4)
OPEN=$(echo $values | cut -d"$FSEP" -f3)
PDF=$(echo $values | cut -d"$FSEP" -f2)

files=$(echo $values | cut -d"$FSEP" -f1 | tr "$ISEP" " ")

for f in ${files}
do
   if [ ! -f "$f" ]
   then
      echo "$f is not a valid file"
   fi
done
TFILE=`mktemp`
echo "TFILE is $TFILE"
/usr/bin/tiffcp ${files[@]} "$TFILE"
/usr/bin/tiff2pdf -z -a"$AUTHOR" -p"$PAGE" "$TFILE" -o "$PDF"
if [ -f "$PDF" ]
then
   if [ TRUE == "$OPEN" ]
   then
      `"$READER" "$PDF"`   
   fi
fi

unlink "$TFILE"
To use the scripts, like I put them here, save each one to a filename of your choice (not identical to that of another existing executable file). Then render them executable and make sure, that the directory that you put them in is found in the PATH. Most people use a directory bin in their home-directory for such scripts, others create a sub-directory to /usr/local and add it to the path-variable.

Shoot at own discretion or comment, if you must.
Attached Images
File Type: png pdf_tiff_sc.png (20.8 KB, 13 views)
« Prev     Main     Next »
Total Comments 1

Comments

  1. Old Comment
    The SoftMaker Office-Suite since version 2016 uses font-forge (if installed) to convert OTF prior generating PDF-files.
    I have tested this a few times with their free version (FreeOffice) and can confirm that the quality of the results has improved. I can print from Evince what was not printable, before.

    In an attempt to comprehend more of the OTF-trouble, I notice that SoftMaker Office uses to embed fonts simultaneously in two encodings, WinAnsi *and* Identity-H, when the “export” function is used to create PDF. If I print into a file, instead, only Identity-H is used and this, too, lets evince handle the resulting files, which are also smaller, a lot better.

    But, at the same time, some PDF-documents are still not printable on paper, when they contain OTF.

    I still use the above procedure or the convert-tool (ImageMagick) to first create images from the affected PDF-files. This works always!
    Posted 05-17-2016 at 05:13 AM by Michael Uplawski Michael Uplawski is offline
 

  



All times are GMT -5. The time now is 11:38 PM.

Main Menu
Advertisement
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration