Table of Contents

Scanning Books

Random notes for now until this becomes more important.

Most dewarpers assume a linear or quadratic dewarping. However, reality is more complicated…

Spreads automates the entire process (including tedious research) quite well. Picks all the right programs and is super-easy in Ubuntu. Optional packages?? zlib1g-dev libpng12-dev libtiff4-dev libjpeg62-dev libxrender-dev

sudo apt-get install ruby-dev libmagickwand-dev libboost-all-dev 
sudo gem install pdfbeads

Smartphone App

Really nice smartphone paper (blurring, adaptive binarization, dewarping?). Another nice paper, references their previous paper.

Camera Calibration

Following OpenCV tutorial, and used this chessboard at 110% zoom on my Thinkpad.

Construction

Cheap/free Instructable. Use a cardboard box, old picture frames (glass/plastic), and a camera with tripod!

OCR Software

Great test that generates test text and sees how several perform.

OpenOCR is a great service combining Google's Tesseract OCR and Docker, which I want to learn about more.

Tesseract

To install Tesseract from source

'tesseract myscan.png -l eng out', more at readme

Optimal letter height in pixels (less pixel scanning, other operations) is … 40 pixels? Also, black and white probably speeds it up. You can also set a specific dictionary or train a “font” of characters.

Leptonica does do dewarping, just fine automagically too!

Bad experience with Tesseract?

Remaining Questions

Scraping from Web

Goose is one standalone but bulky option. Maybe better is Readability API

Camera Remote Shutter

USB shutter (toggling the 5V line on and off, or just plug and unplug usb from a usb port on computer) can work with CHDK, not sure of other cameras.

Remote Control/Preview (ideal for face pulse stuff)

Lots of cameras have PTP protocol built-in for transfer of files over usb and some for even remote control (shutter, preview, etc). Gphoto remote control has a great list. For Canon cameras,CHDK's PTP addons enables some more flexibility than Gphoto.

My Setup

I have a Canon Powershot A3000, but only browse and download of files is enabled.

scan em