Saturday, August 08, 2009

Upgrading the Bayesian-filtered status reader

In a previous post, I described my assembly of a command-line program for Bayesian filtering and presentation of Facebook statuses. After a while, constantly pulling down the RSS feed (via curl) and seeing the same statuses for certain people, day after day until they expired, kind of wore on me, so I went looking for a simple feed reader. There were command-line feed readers that looked more appealing to me (Canto and Newsbeuter) but they required newer versions of Python or some series of installations of things that were not available in a sufficiently upgraded form in Fink. So I tried rawdog which is written in Python and only requires the easily installed feedparser. It seemed like it was modifiable to do what I wanted, and I even went so far as to post a triumphant blog post about my success with getting it to properly parse my Facebook feed and deliver it to my Bayesian-filtering/displaying script. And then the next day, it went haywire and started reshowing me things I had already seen and dismissed.

So.

I decided to do everything myself. I was already partially parsing the RSS feed, so I just went a little further and parsed out the post date and compared it with the date when I last looked through the posts. (This seemed daunting until I found out that Perl already has time functions built in.) Finally I coded in dumping the output to a text file, as well as to the screen. Now the command can work on its own or have its output easily combined with my separate Twitter Bayesian filter which is a bit more sophisticated and which does a nice job of increasing my local Twitter signal-to-noise ratio. I intend to post about it someday.

The code for stati.pl is here.

2 comments:

masukomi said...

I was about to sit down and code something quite similar (bayesian filtered feed reader, was even thinking of basing it on rawdog too, but then I saw you'd already gotten most/all of it written (yay). But, alas, the link to the stati.pl source code doesn't work anymore (boo). :(

Any chance you could update the link, please?

Surly Teabag said...

Thanks for pointing out the broken link. I've fixed it.

If you ever get a feed reader working, please post or e-mail me about it. I would love to play with it.