Friday, June 20, 2008

AppleScript Versus Alameda County

You think your county has problems? Try California's Alameda County, where court rulings are available in an unruly Java applet or as a long series of TIFF images.

I know this because I was interested in reading the ruling in the case of the University of California's stadium project, which is being challenged by various Berkeley groups. (I'm a Cal football fan, so the subject of Memorial Stadium is near and dear to my heart. And I've been known to blog on the subject in my spare time, so I wanted to be up to speed on the ruling.)

Apparently the Alameda County courts are using technology so ancient that, rather than generate a PDF out of whatever computerized document system they used to generate the verdict, they printed out a copy, scanned it in, and posted the images straight from the scanner. (And then relied on a Java applet I couldn't get working on my Mac to display it.)

Ridiculous. So I made a PDF of the whole thing, allowing me to read it easily online or off by using Apple's Preview. I even made the text on the scanned-in pages searchable, so I could find out exactly which page of the 127-page ruling covered obscure topics like oak trees or the Alquist-Priolo Act. Here's what I did.

Because I couldn't find a link to the raw TIFF files (I found them later), I dug out the file format for the TIFFs from the error messages of the Java applet. Then I conjured a simple AppleScript script that would use Interarchy (my FTP client of choice, though just about anything that downloads files from the Internet would do):

Once that was done, I had a folder full of TIFFs. But because the file names didn't end in .tif, my Mac didn't know what they were. Easy. I called up Automator:

I used Automator for a quick addition to the files' names.

This simple Automator action (which I didn't even save, just ran it and then quit) added ".tif" on to the end of every file I downloaded. Presto!

Then I launched Adobe Acrobat Professional (yes, I could have used Preview, but since I have Acrobat Pro at my beck and call, I thought I'd put it to use). I chose File: Create PDF: From Multiple Files, and dragged in all of my images. Once they were assembled into a big PDF, I chose Document: OCR Text Recognition: Recognize Text Using OCR, and Acrobat rotated all the pages to be perfectly aligned (someone in the Alameda County office was a sloppy scanner!) and embedded each page with computer-readable text, which makes the scanned-in document searchable.

When I was all done, I posted my PDF on the Internet and passed the link on to a few people. I believe you'll find my PDF at the San Francisco Chronicle web site, and I know for a fact that my PDF is the one posted on the UC Berkeley news page.

Not bad for a half hour of work. If only the folks in Alameda County had done it themselves.

No comments: