click-know: AppleScript Versus Alameda County

Friday, June 20, 2008

AppleScript Versus Alameda County

You think your county has problems? Try California's Alameda County, where court rulings are available in an unruly Java applet or as a long series of TIFF images.

I know this because I was interested in reading the ruling in the case of the University of California's stadium project, which is being challenged by various Berkeley groups. (I'm a Cal football fan, so the subject of Memorial Stadium is near and dear to my heart. And I've been known to blog on the subject in my spare time, so I wanted to be up to speed on the ruling.)

Apparently the Alameda County courts are using technology so ancient that, rather than generate a PDF out of whatever computerized document system they used to generate the verdict, they printed out a copy, scanned it in, and posted the images straight from the scanner. (And then relied on a Java applet I couldn't get working on my Mac to display it.)

Ridiculous. So I made a PDF of the whole thing, allowing me to read it easily online or off by using Apple's Preview. I even made the text on the scanned-in pages searchable, so I could find out exactly which page of the 127-page ruling covered obscure topics like oak trees or the Alquist-Priolo Act. Here's what I did.

Because I couldn't find a link to the raw TIFF files (I found them later), I dug out the file format for the TIFFs from the error messages of the Java applet. Then I conjured a simple AppleScript script that would use Interarchy (my FTP client of choice, though just about anything that downloads files from the Internet would do):

Once that was done, I had a folder full of TIFFs. But because the file names didn't end in .tif, my Mac didn't know what they were. Easy. I called up Automator:

I used Automator for a quick addition to the files' names.

This simple Automator action (which I didn't even save, just ran it and then quit) added ".tif" on to the end of every file I downloaded. Presto!

Then I launched Adobe Acrobat Professional (yes, I could have used Preview, but since I have Acrobat Pro at my beck and call, I thought I'd put it to use). I chose File: Create PDF: From Multiple Files, and dragged in all of my images. Once they were assembled into a big PDF, I chose Document: OCR Text Recognition: Recognize Text Using OCR, and Acrobat rotated all the pages to be perfectly aligned (someone in the Alameda County office was a sloppy scanner!) and embedded each page with computer-readable text, which makes the scanned-in document searchable.

When I was all done, I posted my PDF on the Internet and passed the link on to a few people. I believe you'll find my PDF at the San Francisco Chronicle web site, and I know for a fact that my PDF is the one posted on the UC Berkeley news page.

Not bad for a half hour of work. If only the folks in Alameda County had done it themselves.

click-know

Friday, June 20, 2008

AppleScript Versus Alameda County

No comments:

Favorites Sites

Junior Programmer / IT Staff

Blog Archive

About Me