Tuesday, November 14, 2006

pluck - A perl script to simplify saving web pages for offline browsing on a Palm

Plucker Desktop, the program to convert web pages and text files into a format readable on the Palm (using the Plucker viewer), is purported to not work on Intel Macs. Whatever - I always found it a very clicky program and have wanted for years to install a command-line equivalent. Stymied by problems with compiling the Plucker Python distiller, I looked briefly at payware iSilo, but it, too, is not yet working on Intel. I dug around in the Plucker Python distiller and found a version of the Python scripts which can be run directly by just plopping them wherever.

Then I wrote a simple perl script for processing those .webloc files that I drag off the [Safari|Shiira] address bar:

#!/usr/bin/perl -w
@files = ;
foreach $file (@files) {
if ($file=~/\s/)
{$newfile=$file;
$newfile=~s/\s/_/g;$newfile=~s/\'//g;$newfile=~s/\#//g;
$newfile=~s/\&//g;$newfile=~s/\://g;$newfile=~s/\;//g;
$newfile=~s/\|//g;$newfile=~s/\?//g;$newfile=~s/\"//g;
$newfile=~s/\(//g;$newfile=~s/\)//g;$oldfile=$file;
$oldfile=~s/ /\\ /g;$oldfile=~s/\'/\\'/g;$oldfile=~s/\;/\\\\;/g;
$oldfile=~s/\&/\\\&/g;$oldfile=~s/\:/\\\:/g;
$oldfile=~s/\|/\\\|/g;$oldfile=~s/\?/\\\?/g;
$oldfile=~s/\"/\\\"/g;$oldfile=~s/\(/\\\(/g;
$oldfile=~s/\)/\\\)/g;
`mv $oldfile $newfile`;
print "mv $oldfile $newfile\n";
}
}

$i=0;
@files = ;
foreach $file (@files) {
$URL=`strings $file/rsrc | grep http | sed '/^.http/s//http/' | head -1`;
$URL=~s/\s+//g;
$URL=~s/\//;
$URL=~s/\<\/string\>//;

print $URL;print "\n";
$originalfile=$file;
$justfilename=$file;
$justfilename=~s/\/Users\/Surly\/plucker\/to-pluck\///;# WITH .webloc
$pdbname=$justfilename;
$pdbname=~s/\.webloc//;
$pdbname=substr($pdbname,0,20);
$p='/Users/Surly/Documents/Palm/Users/Surly/Files\\ to\\ Install/';
`/Users/Surly/bin/PyPlucker/Spider.py --maxdepth=1 --home-url=$URL -f $p$pdbname`;
`mv $originalfile /Users/Surly/plucker/plucked/$justfilename`;
}

Yeah, it's ugly, but it works.

Then it was necessary to use Fink to install netpbm for processing some of the image files. This failed on jpegs, and even though I had set up ImageMagick (twice, since Fink also did an install) and had set image_parser = ImageMagick in .pluckerrc Spider.py was seemingly still defaulting to netpbm. Anyhow, I noticed that there were some more image processing settings under the [POSIX] section of the .pluckerrc file so I installed libjpeg and then in the .pluckerrc file set djpeg_program = djpeg. And now it all seems to work.

What's left to do: devise a scheme for handling the more complicated plucking jobs: like those that require altering the default maxdepth value (1 seems _so much_ better than 2) or those that need special URL matching; Folder Actions for automating the plucking; catching errors in parsing and not moving the .webloc file under those circumstances.

The bottom line: Plucking on an Intel Mac is not that hard to set up.

No comments: