Skip to content

Category: books

Archiving Old Books

Looking through the bookshelf, it appears I have a few titles which at least the Internet Archive doesn't have.

I've enjoyed the use of books and magazines others have scanned and uploaded online, so I thought it only prudent I do the same. My printer has an ADF, so figured I'd give it a go!

Preparation

Unless you want to be scanning individual pages on a flatbed, or with some kind of camera set up, you'll need to prepare your books. Essientially, this involves cutting or removing the spine of the books.
This is clearly a destructive process, so may not be one everyone would like to do.

There's lots of advice that can be found online on how to do this. I didn't want to spend a lot of time or money for this step.

Ready to Scan

Instead, I went down my local Officeworks which has a service for this. For $1 a book, they'll use their fancy guillotine to cut and remove the spine. I suspect on thicker books it might need to be sliced a couple of times to be able to fit (I was advised they could do up to 250 80gsm pages at a time!).

Scanning

I'm lucky enough to have an MFC at home with an automatic document feede (a Konica Minolta Bizhub C35).

Scanner goes brrr

Scanner goes brr

I used the built-in "Windows Fax and Scan" application to connect to the scanner via WIA, and scanned directly to TIFF at 600 DPI. Note that the documents I'm using were all black-and-white, so used black-and-white mode to scan.

Preparing the PDF

I couldn't find a good free option for this. Instead, I used Foxit PDF Editor to convert from the scanned TIF image to PDF. It does deskewing and OCR for me which is really handy.

I made sure to fix up the page numbers in the PDF to match the pages in the books (it's a pet hate of mine when PDFs don't do this), and add some missing meta-data.

Issues

I've only done two so far and it's been reasonably smooth. Lessons so far:

  • For glued spined books, make sure you flip through every page and ensure each page is free. It'll save having to rescan pages which went through all at once.

  • Archive.org don't like the output from Foxit PDF. I had to run it through PDFTK to make Internet Archive happy :/

Uploads so far

Leave a Comment