Proofreading as a hobby

I like checking up on Blackmask.com every now and then to see the latest public domain e-books that have been posted. I read e-books on my Clie using the fabulous iSilo and Blackmask thoughtfully provides the books in various formats (text, HTML, iSilo, Mobipocket, etc.) for reading on digital devices.

I think they’ve probably got the whole run of Doc Savage and most of The Shadow–pulp adventures seem to be their specialty–in addition to the run-of-the-mill stuff you see from Project Gutenberg: many a quaint and curious volume of forgotten lore: novels, poetry, antiquated reference books, old literary magazines, and other paper ephemera digitized for the new age.

If you’ve got an interest in an old author, Blackmask is a great first source to check–even my local public library doesn’t have all these Arnold Bennett books. If you can’t find what you want there, try Gutenberg; I’m not sure how much overlap exists between the two.

Whenever I visited Blackmask, I was always intrigued by the banner ads on their page that read “GO PROOF A PAGE, WE’LL WAIT RIGHT HERE FOR YOU”. So a few weeks ago, I clicked on this link and was whisked to the Distributed Proofreaders site. The DP site is a volunteer-run group that proofs the OCR scans of old books and magazines that will eventually find their way to the Project Gutenberg site (and Blackmask).

The idea is that you volunteer to be a proofreader, working at your computer, on your own time, and you can proof as many pages as you want (they hope you do at least one a day). The scans can range from messy to clean, and there’s an extensive set of guidelines to adapt and interpret the scanned text so that it compiles nicely for electronic reading. (I printed out the one-page summary to keep in my “Fingertip” folder by my computer.)

Beginners can try out the books ranked as EASY; friendly mentors let you know where you can improve your technique; and as your number of proofed pages increase, other bits of the site become accessible to you, such as a random proofing guideline on your login page.

I very much like the new filtering option: when I log in, I now see only books in English of average difficulty. I didn’t realize there were so many other languages that were involved in this effort: Dutch, Spanish, Tagalog, as well as blends of English and other languages.

This proofing I’m doing is what’s called the “first round”; the big problems are cleaned up here, obvious errors fixed, standard formatting entered. So far, I’ve proofed about 40 pages. After I’ve proofed 100 pages, I’m eligible to do second-round proofs, working as another pair of eyes to ensure the first-rounders didn’t let certain niceties slip by them.

As I should have expected, there is an active and lively sub-culture on display at the forums. I recently discovered there are “index junkies,” who seek out the clean-up and codification of scanned indexes. These guys like a challenge. Another forum member likes to do the 2-column literary magazine scans (such as of the Civil War-era Atlantic magazines), because they require more hands-on work and are in need of closer proofing.

So far, I’ve shied away from some of the really complicated pages that blend italicized Latin and Greek words along with footnotes, annotations, glosses, illustrations, lists, and the like. I prefer to do whatever can be done in 30 minutes or so. I feel a good satisfaction at taming a chaotic page and making it look and read sensibly. And for a bookworm, there’s no better cause than to keep a book going.

If you don’t like reading on a computer screen, then this may not be something you want to do. But if you’re spending a ton of time at the PC anyway, it’s at least as interesting as reading RSS feeds, and I daresay a touch more useful.

Update: I neglected to mention that I use Netcaptor for my proofreading. Netcaptor is a tabbed browser based on the IE engine. When I proofread, I have one tab holding the scanned page and the OCR text beneath; one tab dedicated to the forum post discussing the book; and one tab dedicated to the big Guidelines page. I can also open other tabs if I need to Google a spelling or odd word. I have a Netcaptor group, “Proofreading,” that loads the basic tabs in an instant. You can use Mozilla as well, but I’m more comfortable with Netcaptor, as I’ve used it for years.

For complicated scans, I open Notetab Pro (a tabbed Windows-based text editor), copy the scanned text there, and do my editing.

DP also offers an especially ugly monospaced font that they encourage you to use when you proof. It’s heinous, but it helps flag misspelled words that would look familiar if you scanned them too fast.

Update, 17 May 2005 Since first writing this, I’ve not been back to Distributed Proofreading for a few months. At the time I started, I was unemployed and had the time to devote to it. But then I did get a job and then “life” happened, on several fronts, that took my time and energy away from recreational things.

I listed out all the available activities I could do of a day or an evening, and I divided them into High Payoff and Low Payoff activities. Sadly, DProofreading fell into the Low Payoff category. After classifying proofreading as a low-payoff, I rarely returned. Too bad.