User Tools

Site Tools


software:unpaper_test

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
software:unpaper_test [2009/05/01 01:59] adminsoftware:unpaper_test [2015/04/22 21:51] (current) – [findblack] admin
Line 1: Line 1:
-====== Unpaper test =====+====== Unpaper test ======
 In order to better understand the vast amount of options in unpaper, I've done several tests, which show differences between settings.\\ I scanned a lot of sheetmusic, processed them manually with another application to remove skew. All actions have been done in 8-bit gray at a resolution of 300 dpi.\\ This test should show which settings should be done with unpaper to get a decent automatic conversion to black and white with despeckle and noise removal. In order to better understand the vast amount of options in unpaper, I've done several tests, which show differences between settings.\\ I scanned a lot of sheetmusic, processed them manually with another application to remove skew. All actions have been done in 8-bit gray at a resolution of 300 dpi.\\ This test should show which settings should be done with unpaper to get a decent automatic conversion to black and white with despeckle and noise removal.
  
Line 34: Line 34:
  str1="0.$nmb"  str1="0.$nmb"
  
- unpaper -b $str1 $donot -t pbm /tmp/$ptmpf-1.pgm /tmp/$ptmpf.pbm+ unpaper -b $str1 $donot -t pbm $ptmpf-1.pgm $ptmpf.pbm
  
  # convert inputfile option annotation outputfile  # convert inputfile option annotation outputfile
- convert /tmp/$ptmpf.pbm -gravity NorthWest -annotate 0 "unpaper -$str1 $donot" /tmp/$ptmpf_ca.pbm + convert $ptmpf.pbm -gravity center -background lightblue -font Open-Sans -pointsize 60 caption:"$str1 \n $donot" -composite $ptmpf_ca.pbm 
- convert -monochrome -density 300 -units PixelsPerInch /tmp/$ptmpf_ca.pbm ps:- | ps2pdf13 -sPAPERSIZE=a4 - "/tmp/$ptmpf-$nmb.pdf" + convert -monochrome -density 300 -units PixelsPerInch /$ptmpf_ca.pbm ps:- | ps2pdf13 -sPAPERSIZE=a4 - "$ptmpf-$nmb.pdf" 
- rm /tmp/$ptmpf.pbm /tmp/$ptmpf_ca.pbm + rm $ptmpf.pbm $ptmpf_ca.pbm 
- pdflst="$pdflst /tmp/$ptmpf-$nmb.pdf"+ pdflst="$pdflst $ptmpf-$nmb.pdf"
  nmb=$(($nmb+1))  nmb=$(($nmb+1))
 done done
Line 83: Line 83:
  
 ===== Scripting with unpaper ===== ===== Scripting with unpaper =====
 +==== gray2black ====
 Automatic conversion from pdf files, containing gray images, to black and white images with unpaper and its noise reduction and cleaning up features can be done with the following script: Automatic conversion from pdf files, containing gray images, to black and white images with unpaper and its noise reduction and cleaning up features can be done with the following script:
 <code bash> <code bash>
Line 101: Line 102:
 # #
 # Example: # Example:
-# Process gray images in mozart.pdf: gray2black mozart.pdf -p 4 test.pdf+# Process gray images in mozart.pdf: gray2black mozart.pdf -b 0.23 test.pdf
  
 # This script uses imagemagick to convert and center an image on an # This script uses imagemagick to convert and center an image on an
Line 132: Line 133:
  
 # Only resize if absolutely neccessary. Values for resizing are for left, right, top and bottom border. # Only resize if absolutely neccessary. Values for resizing are for left, right, top and bottom border.
-resize=0 
 hborder=33 hborder=33
 vborder=33 vborder=33
Line 175: Line 175:
  exit 1  exit 1
  else  else
- # convert first page to a pgm file with filename /tmp/$ptmpf-*.pgm + # convert first page to a pgm file with filename $ptmpf-*.pgm 
- pdftoppm -gray -r 300 "$arg1" /tmp/$ptmpf+ pdftoppm -gray -r 300 "$arg1" $ptmpf
  
  # Check for document resize neccessity. Iterating through all pages, makes sure, all pages get uniform resizing later.  # Check for document resize neccessity. Iterating through all pages, makes sure, all pages get uniform resizing later.
  
- for file in $(ls --sort=time -r /tmp/$ptmpf-*.pgm); do + resize=0 
- dimension="$(identify -format "%[fx:w]x%[fx:h]" $file)"+ # Iterate through files. We need to sort files in a numerical order because pdftoppm generates files without a leading zero.  
 + # However before the numbers, a '-' delimiter character had been added by pdftoppm. Now cut the part before that. 
 + for filenum in $(ls $ptmpf-*.pgm | cut -d'-' -f2 | sort -n); do 
 + dimension="$(identify -format "%[fx:w]x%[fx:h]" $ptmpf'-'$filenum)"
  #  width: "${dimension%%x*}" -> From the end removes the longest part of dimension that matches x* and returns the rest.  #  width: "${dimension%%x*}" -> From the end removes the longest part of dimension that matches x* and returns the rest.
  #  height: "${dimension##*x}" -> From the beginning removes the longest part of dimension that matches *x and returns the rest.  #  height: "${dimension##*x}" -> From the beginning removes the longest part of dimension that matches *x and returns the rest.
Line 193: Line 196:
  fi  fi
  done  done
- 
  # Resize document if neccessary  # Resize document if neccessary
  if [ $resize -eq 1 ]; then  if [ $resize -eq 1 ]; then
Line 213: Line 215:
  echo Some images exceed the maximum size. Now scaling to "$scale"%.  echo Some images exceed the maximum size. Now scaling to "$scale"%.
  # iterate through all files and resize all with the same percentage, use 8 bits per pixel  # iterate through all files and resize all with the same percentage, use 8 bits per pixel
- for file in $(ls --sort=time -r /tmp/$ptmpf-*.pgm); do + for filenum in $(ls $ptmpf-*.pgm | cut -d'-' -f2 | sort -n); do 
- mogrify -depth 8 -resize "$scale"% $file + mogrify -depth 8 -resize "$scale""$ptmpf-$filenum" 
- echo resizing file: "$file"+ echo resizing file: "$ptmpf-$filenum"
  done  done
  fi  fi
  # apply unpaper onto each page  # apply unpaper onto each page
- for file in $(ls --sort=time -r /tmp/$ptmpf-*.pgm); do + for filenum in $(ls $ptmpf-*.pgm | cut -d'-' -f2 | sort -n); do 
- # Parameter expansion: ${param%word} From the end removes the smallest part of param that matches word and returns the rest. (p.72) + # Parameter expansion: ${param%word} From the end remov? Mis je nog iets?es the smallest part of param that matches word and returns the rest. (p.72) 
- unpaper -b $b_threshold $filter $donot -t pbm $file ${file%.pgm}.pbm + unpaper -b $b_threshold $filter $donot -t pbm "$ptmpf-$filenum" "$ptmpf-"${filenum%.pgm}.pbm 
- rm "$file"+ rm "$ptmpf-$filenum"
  done  done
  
  # center pbm page on an a4 canvas, convert to pdf  # center pbm page on an a4 canvas, convert to pdf
  pdflst=""  pdflst=""
- for file in $(ls --sort=time -r /tmp/$ptmpf-*.pbm); do + for filenum in $(ls $ptmpf-*.pbm | cut -d'-' -f2 | sort -n); do 
- convert -size $pdim xc:white miff:- | composite -density $pres -units PixelsPerInch -compose atop -gravity Center "$file" - miff:- | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "${file%.pbm}.pdf" + convert -size $pdim xc:white miff:- | composite -density $pres -units PixelsPerInch -compose atop -gravity Center "$ptmpf-$filenum" - miff:- | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "$ptmpf-"${filenum%.pbm}.pdf 
- rm "$file+ rm "$ptmpf-$filenum
- pdflst="$pdflst ${file%.pbm}.pdf"+ pdflst="$pdflst $ptmpf-"${filenum%.pbm}.pdf
  done  done
  
Line 241: Line 243:
 </code> </code>
  
 +==== findblack ====
 Find a value for black threshold with following script: Find a value for black threshold with following script:
 <code bash> <code bash>
Line 286: Line 289:
 # Default page number # Default page number
 defpage=1 defpage=1
-# Store input-file name temporarily, because it's deleted by shift command. +
-arg1=$1+
 # Find number of parameters passed # Find number of parameters passed
 if [ "$#" -le 1 ]; then if [ "$#" -le 1 ]; then
Line 293: Line 295:
  exit 1  exit 1
 fi fi
 +
 +# Store input-file name temporarily, because it's deleted by shift command.
 +arg1=$1
  
 # Check OPTIONS # Check OPTIONS
Line 335: Line 340:
  fi  fi
  # convert first page to a pgm file with filename tmpfile_PID-1.pgm  # convert first page to a pgm file with filename tmpfile_PID-1.pgm
- pdftoppm -gray -r 300 -f $defpage -l $defpage "$arg1" /tmp/$ptmpf+ pdftoppm -gray -r 300 -f $defpage -l $defpage "$arg1" $ptmpf
  # check for succesful conversion to .pgm  # check for succesful conversion to .pgm
- if [ ! -f "/tmp/$ptmpf-$trail$defpage.pgm" ]; then+ if [ ! -f "$ptmpf-$trail$defpage.pgm" ]; then
  echo "$_fbcvrt"  echo "$_fbcvrt"
- rm "/tmp/$ptmpf-$trail$defpage.pgm"+ rm "$ptmpf-$trail$defpage.pgm"
  exit 1  exit 1
  else  else
Line 355: Line 360:
    
  # process unpaper  # process unpaper
- unpaper -b $str1 $donot -t pbm /tmp/$ptmpf-$trail$defpage.pgm /tmp/$ptmpf.pbm+ unpaper -b $str1 $donot -t pbm $ptmpf-$trail$defpage.pgm $ptmpf.pbm
    
  # Create white A4 size canvas -> center pbm file into canvas -> convert pbm to ps -> convert ps to pdf  # Create white A4 size canvas -> center pbm file into canvas -> convert pbm to ps -> convert ps to pdf
Line 362: Line 367:
  # Sending result to next command with pipe: |  # Sending result to next command with pipe: |
  # Using this 'technique' we don't need to create temporary files.  # Using this 'technique' we don't need to create temporary files.
- convert -size $pdim xc:white miff:- | composite -density $pres -units PixelsPerInch -compose atop -gravity Center /tmp/$ptmpf.pbm - miff:- | convert - $pstr "$ptxt $str1" - | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "/tmp/$ptmpf-$nmb.pdf"+ convert -size $pdim xc:white miff:- | composite -density $pres -units PixelsPerInch -compose atop -gravity Center $ptmpf.pbm - miff:- | convert - $pstr "$ptxt $str1" - | convert -monochrome -density 300 -units PixelsPerInch - ps:- | ps2pdf13 -sPAPERSIZE=a4 - "$ptmpf-$nmb.pdf"
    
- pdflst="$pdflst /tmp/$ptmpf-$nmb.pdf"+ pdflst="$pdflst $ptmpf-$nmb.pdf"
  nmb=$(($nmb+$s))  nmb=$(($nmb+$s))
- rm /tmp/$ptmpf.pbm+ rm $ptmpf.pbm
  done  done
- rm /tmp/$ptmpf-$trail$defpage.pgm+ rm $ptmpf-$trail$defpage.pgm
  # Merge all pdf files in $pdflst into one single file with filename $2  # Merge all pdf files in $pdflst into one single file with filename $2
  gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=$2 $pdflst  gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -sOutputFile=$2 $pdflst
Line 377: Line 382:
 exit 0 exit 0
 </code> </code>
 +
 ===== Bugs ===== ===== Bugs =====
 [[software:unpaper_test:bugs|See the following page]] [[software:unpaper_test:bugs|See the following page]]
software/unpaper_test.1241135963.txt.gz · Last modified: 2009/05/01 01:59 by admin