Friday, March 02, 2012

Improving Twitter signal-to-noise ratio with TTYtter extensions

I've been using TTYtter as my primary interface to Twitter for almost a year, and it is still great. In fact, the new releases have made it even better. And the depth of configurability is amazing.

One Twitter user that I follow posts some very cool stuff, but also lots of side-chatter that doesn't interest me. Twitter lets users turn off retweets on a per-user basis, but not the messages to other users (indicated by the presence of the "@" symbol). TTYtter's extensibility makes this easily remedied with a short bit of Perl code that is run during the processing of each Twitter status.
$handle = sub {
my $ref = shift;
my $text = &descape($ref->{'text'});
my $name = &descape($ref->{'user'}->{'name'});
return 0 if (($text =~ /@/i) && ($name =~ /Mr. Noisy/i));

&defaulthandle($ref); #actually print the Twitter message
return 1;
};

The first two bolded lines extract the status text and the name of the person who posted it. The third line checks whether the text is from Mr. Noisy and whether it contains an "@". If so, the function returns a 0 before the code that prints that message to the screen is executed.

Then when I run TTYtter, I include the "-exts=..." flag among the command line arguments:
ttytter -exts="/Users/surly/lib/denoise.pl"
And all of Mr. Noisy's Twitter postings that contain the "@" symbol are filtered out.

Another option is to only include postings from a given user when they receive a certain number of retweets. Here, the bolded lines also show how to use the user's actual Twitter handle (because it's the more common and convenient name to use):

$handle = sub {
my $ref = shift;
my $text = &descape($ref->{'text'});
my $name = &descape($ref->{'user'}->{'name'});
return 0 if (($text =~ /@/i) && ($name =~ /Mr. Noisy/i));
my $rtcount = &descape($ref->{'retweet_count'});
my $sname = &descape($ref->{'user'}->{'screen_name'});
return 0 if (($rtcount < 10) && ($sname =~ /verboseguy/i));

&defaulthandle($ref); #actually print the Twitter message
return 1;
};


I worked out most of this from the TTYtter page on extensions and advanced topics. A partial list of fields that you can extract and filter on are listed in the Twitter documentation on the JSON data representation of a single status object. There is a lot of metadata that could be conceivably used for filtering.

Simpler filtering on the text of Twitter statuses can be carried out using the command-line -filter argument, but these extensions allow far more sophisticated ways of controlling the flow of Twitter information to your screen.
Other extensions that I like:
  • deshortify.pl - which deobfuscates all of those "t.co" hyperlinks.

Thursday, March 17, 2011

TTYtter: a turbo-charged console Twitter client

I finally got around to updating to a real Twitter client again. Twitter now requires that applications use OAuth authentication instead of a standard username/password login, rendering my old command-line Twitter programs obsolete.

But then I found the awesome Perl-based TTYtter. First, unlike a lot of Perl-based applications, you don't have to recursively install lots of other modules to get TTYtter to work. I just installed the single file Perl script, and it worked. The OAuth set-up is clearly described and easy. The benefits of this program are huge! It supports the new-fangled retweets, favoriting, direct-messaging, tracking arbitrary keywords, and searching (I visit search.twitter.com a lot less often). It can retrieve the entire thread for any given Twitter post. If your version of Lynx or cURL supports SSL, TTYtter can use SSL encryption on your Twitter traffic.

There is one Perl module that I would recommend you install though... Term::ReadLine::TTYtter fixes the readline behavior AND gives a cool dynamically-updating character count for what you have typed at the prompt

TTYtter supports Growl notifications, so you can set it up to inform you when you get a direct message or just when there is a new update of any kind. It is highly configurable.

Mac users: Make a file called ".ttytterrc" in your home directory containing just this one line:

urlopen=open -g %U

and then when you use the /url command, TTYtter will open the corresponding URL in your default browser. The "-g" switch opens the page in the background, keeping your Terminal window in the foreground.

One downside to the high degree of configurability is that there is a lot to learn to fully exploit the powers of TTYtter. (I run the program with the following set of command-line switches: ttytter -ssl -readline -notifytype=growl -notifies=dm,reply -notifyquiet -timestamp -filter="/spn\.tw/||/\#FF/".) But the documentation is well-written, and what you get is worth the learning curve.

Friday, July 02, 2010

A history of Google from one user's perspective

I recently thought about some of the ways that Google has revolutionized the Internet. As I started doing a little research, I found that there were more services that they had introduced, which were important at the time and which I still use today, which I was taking for granted. A lot of these memories were triggered by Google's own list of milestones.


1997: Search (Google search actually started in ~1996, but it wasn't until about 1997 that I discovered it.) Prior to Google, searching the Internet was done through Alta Vista which was crappy at finding what I wanted. I spent hours tweaking Boolean expressions, trying to get it to find the right thing. Google easily supplanted it.

2000: An important component in the cleanness and niceness of Google (Search) was the lack of annoying ads which threatened to overtake the rest of the Internet. Google's introduction of AdWords guaranteed a continued peaceful search experience while allowing Google to make money.

2001: Google saves the Usenet archives and makes them way more accessible as Google Groups (which name later came to mean primarily their mailing list/web forum offering, displacing Yahoo(!) Groups which is serviceable but a bit crappier and ad-ridden).

2002: Google News - A better interface for finding news articles (both recent and older stuff) without having to use some specific source or pay for the privilege.

2002: Froogle (now known by the less clever name "Google Shopping"). While never quite ideal, the interface made for a far more pleasant way of finding good prices for purchases on the Internet.

2003: Google acquires Blogger - the free, blogging service which brought you this post.

2004: Google Local is introduced and eventually surpasses other Yellow Pages services in usefulness and intuitiveness.

2004: Gmail revolutionizes web mail which had previously just been a crappy e-mail solution for people without access to *real* e-mail (back when men were men, women were women, and e-mail was text).

2004: Google Scholar: Google indexes academic articles, making them way easier to find. What was it like before Google Scholar? Hell.

2005: Google Maps revolutionizes online maps, providing mouse control over the map for zooming and shifting, bigger and better maps, satellite views, and increasing features and usability. Mapquest was previously my default source for online maps (though I would sometimes use MapBlast for its wonderfully sparse LineDrive driving directions).

2006: Google buys YouTube. At the time, I did not like this because YouTube's interface and video resolution were much worse than available on Google. But YouTube has gradually improved under Google.

2006: Google launches Google Docs & Spreadsheets. While I don't use these for my own purposes, people send me information formatted in such forms frequently enough that having Gmail automatically open them in a usable online viewer makes my life easier.

2007: Google Street View makes it possibly to become familiar with an area and find specific businesses and buildings without ever having to go there.

2007: GOOG-411 allows you to call up 800-GOOG-411 and get directory and business information by speaking location and desired business/location information. I like using it. It's good for when I am trying to accomplish something under a deadline (and don't have time to search from my phone by typing in search terms). [GOOG-411 is unfortunately being done away with in 2010. Microsoft bought TellMe and made BING-411, which is reputedly a good alternative.]

2009: Google Voice: Much of the credit is due to Grand Central, but under Google, this finally got text messaging capabilities and voice mail transcriptions and is just now way better. I don't know whether this will gain wide acceptance or use in the near term, but I believe that in ten years, the more sophisticated phone services which Google Voice allows (ringing many phones from one number, filtering incoming calls to specific phones or voice mail based on who is calling, etc.) will be standard telephone features.

2010: I expect that I will soon appreciate the power of the just released GoogleCL. It makes it possible to access a few Google services from the command line. Currently this is limited to Blogger, Calendar, Contacts, Docs, Picasa, and YouTube.


What this analysis confirms is that (with the exception of Google Voice) no single product that Google has released since 2005 has had the same value for me as Google Maps or Gmail. I'm not sure if this is because Google's new services have not been as vital or simply because I have been adopting fewer new services in the past few years.*

Google Print (started in 2003 and later becoming Google Books) and Google Reader (introduced in 2005) are two obvious contenders that I did not list above because I don't feel that I use them enough that they have impacted my life. Google Translate, I use frequently, but I never remember it surpassing Babelfish. It just became my default translation service somehow. Also Google Talk (2005), Google Calendar (2006), Android (2007), Chrome - Google's web browser (2008 - Mac version, 2010), and Google Wave (2009) have not found their way into my life yet. But even if they never do, the many little continuing improvements to Google Search and Gmail and other products, combined with Google still doing things the way that I would hope they would much of the time (such as offering command-line access to a user's Google data as mentioned above, as well as Google's new encrypted search engine: https://encrypted.google.com/) maintain Google's position in my esteem.



* A third possibility: There is a finite set of web sites that a typical user is going to use enough to be able to recall them and consider them useful. The longer I am on the Internet, the more of those slots are taken up by things that have been around for ~10 years. In the interest of shaking things up a little, let me recommend a new search engine I found when following up on Google's encrypted search:

The criticism of Google's encrypted search is that Google still retains the search records, so privacy is improved but could be better. An alternative is a new search engine called Duck Duck Go. It offers encrypted search AND complete privacy by not logging your IP address or browser information or retaining individual search histories of any type. It supports keyboard shortcuts, YubNub-like searching of other sites ([!g search terms] goes directly to the Google search results for those terms) and easily customizable results pages. It is worth checking out.(back to footnote marker)
UPDATE: 2012: Google announces the imminent demise of Froogle (currently operating under the name "Google Product Search") which will be replaced by "Google Shopping" which is described as "a purely commercial model built on Product Listing Ads." Up until now, the Google Product Search results were unbiased by which companies gave advertising money to Google. This new version will seemingly put the onus for providing product, pricing, and possibly availability information on the retailers, but it seems to also mean that retailers who do not bother to advertise with Google will not have their products appear under the new Google Shopping system. How this compares to the old system for the user will remain to be seen.

My usage of Google services has shifted somewhat since I first wrote this post. I still use Search, Gmail, Maps, StreetView, Google Scholar, YouTube (and Google Video search), and Voice. I find that I rarely use the Usenet search functionality of Google Groups, though I do use its mailing list functionality. Sadly, I have so far not found a use for GoogleCL. Google Translate is now definitely better than Babelfish was. I've started using Google Reader regularly. Despite the way its Google-Plus-associated makeover has made it less responsive and given it an uglier design that wastes screen space, it has the same convenience advantage that Gmail used to lure me away from e-mail clients that were not Web-based. While the interface for Google Books is still not great (giving very little vertical reading space), the fact that it is the only place on the Internet where certain books are available has made it highly useful at times. Google Docs & Google Spreadsheets are useful for real-time collaboration on projects. And after years of waiting for it, in 2011, Google finally improved on TinEye's reverse image search with their own reverse image search, allowing you to upload an image and then finding the same image and/or the most similar images on the Internet.

So despite my complaints about the elimination of services like GOOG-411 and Froogle, this overview has shown me that Google's various services are getting better and becoming more useful (from my perspective).

Sunday, May 30, 2010

Command-line Bayesian Twitter reader with ad-blocking

UPDATE: The method below and the correspond script recently stopped working since Twitter changed their login method to "OAuth". The actual necessary changes will be a) switching to an updated Twitter client like TTYtter (which looks pretty awesome) and b) maybe revising the parsing of the Twitter client output. I've just started working on this and don't know when or if I will release updates. The script still works for Facebook updates, as is.


Following up on my initial post on command-line Bayesian filtering and the subsequent update, I have now generalized the script to incorporate not just Facebook status updates but Twitter posts as well.

I've been refining this script over the last year. In addition to providing a convenient way of incorporating Bayesian filtering in the tracking of posts from Twitter and Facebook, it now has the ability to review a particular Twitter user's posting history (via the "rewind" option) and to whitelist or blacklist arbitrary terms, which can be useful in permanently filtering out undesirable subjects or ads. And it tries to auto-expand those annoying shortened URLs and display the actual title of the web page being referenced.

This script requires that your system has the following programs: links (terminal-based web browser), curl, dbacl (Bayesian decision engine), twyt (command-line Twitter interface).

(I actually use another command line Twitter client (twixer) for following and unfollowing other people's accounts. And, while Twyt should work for posting on Twitter, I often use a third program for posting (twerp) which I started using before I found Twyt.)

Typical usage of the Bayesian twi script:

Teapot:~ surly$ twi
0: [5325528218] StephenAtHome: wearing a mask is
so sweaty. i don’t know how those scooby-doo
villians did it. (Sun Nov 01 00:40:05 2009 via web)
o)k, b)ad, u)rgent, open l)ink, c)opy, add to q)uotes, r)ewind, <CR>=next


The number "0" is just an index that counts down to zero to help identify the status messages you will be processing in this session. The number in brackets comes from the twyt program and is a unique Twitter message ID.

The options presented after the post are:
o)k: Add the post to the "ok" data file (for stuff you want to see more of in the future) and go to the next item.
b)ad: Add the post to the "bad" data file (stuff you don't want to see) and go to the next item.
u)rgent: Add the post to the "urgent" data file and go to the next item. This is a category I came up for for messages that say things like "Party at my place tonight" or "Free T-shirts for the first ten respondents". All I am really doing with them at the moment is highlighting them in a brighter color, but my idea is that in some future revision, these might be actively fetched and brought to my attention. At the moment, so few messages qualify as "urgent" that it's still a half-baked idea.
open l)ink: Opens the first extracted hyperlink in Safari.
c)opy: Copies the text of the post to the OS X system clipboard.
add to q)uotes: Appends the post to a text file with my collection of quotes. In the future, I might cause this to also mark the post as a "favorite" on Twitter, but this is not a priority for me.
r)ewind: The rewind feature was designed to compensate for situations where your Bayesian filter causes posts to filter below your radar and then suddenly another one from the same person bubbles up to the surface, but you don't understand it because you are missing the context. Rewinding causes the program to start going backwards through the locally cached history of all posts (in the "all.d" file) searching for posts from the user in question. It will keep feeding you previous posts as you keep hitting keys (or until you type q for "quit"). This feature only works on Twitter accounts at the moment, due to the way it is extracting the name to grep for.
<CR>=next: You've got to hit the carriage return to advance to the next post. If you type "o" and hit enter, the post will be categorized as "ok". If you just hit enter, your Bayesian filter will learn nothing from this post. Initially, it's good to give your filter as much definite feedback as you can, but once you are happy with its performance, you can just hit enter all the way through if you like.

Importantly, but perhaps not obviously, any set of these commands can be entered for a given post, and they will all be executed. Example:
o)k, b)ad, u)rgent, open l)ink, c)opy, add to q)uotes, r)ewind, <CR>=next

olq
will categorize the item as "ok", open the associated hyperlink, and add the item to the quotes file. There is nothing stopping you from categorizing the item as ok, bad, and urgent, but this will almost certainly just confuse the Bayesian filter.

Finally you will be prompted to "Categorize which other posts as ok?"


When the program is finished processing all the new status messages, it will post an accuracy rating for the current session (basically, if you don't mark any of the "ok" messages as bad or correct any of the "bad" ones to be ok, it thinks it has 100% accuracy). You can use this as a rough metric for how good the current training level is.

Limitations: The current version of Twyt does not seem to be picking up the newfangled "Retweet" posts. Based on the messages that I am missing, this is something of a net advantage.

The twi script can be found here.

Saturday, March 27, 2010

A Palm user tries to convert to the iPod touch

I finally bought an iPod touch. I had wanted one for two years. I really started to want one last year, when Apple started allowing developers to write applications for the iPhone OS. With the release of the 3.0 version of the iPhone OS, it had seemed that there was finally enough functionality to allow the iPod touch to serve as my primary PDA.

First reactions: Even though I had seen it before and played with it in stores, the actual experience of owning a device with such an amazing and cool-looking interface has exceeded my expectations.

Something about many of the apps available from the iTunes Store sort of immediately turned me off though. Lots of them have a desperate commercial aura that I associate with Windows shareware programs. Many of the free programs are prominently labelled as "FREE" or "LITE" and some have ads in them. It's usually apparent after running a program whether it feels right. Since I prefer using open source programs, the iPhone apps scene, where everyone has to pay $100 annually to have their programs available to be bought or downloaded, feels kind of constrained.

Fortunately there are exceptions. The programs I list below (in particular, Instapaper and Simplenote) have great interfaces and are a joy to use.

A comparison of Palm and iPhone programs for a few functions.

Function: General purpose maps
Palm program: Mapopolis
iPhone equivalent: offmaps
Comments: offmaps is much cheaper (though this is thanks to the OpenStreetMaps initiative for free map data), but it can't search for streets without an Internet connection. Also it frequently doesn't have the amount of cached detail that I had hoped it would have. I'll bet there's a better maps program out there, but I haven't found it yet.
Advantage:Palm*

Function: GPS turn-by-turn navigation
Palm program: TomTom
iPhone equivalent: TomTom? Navigon?
Comments: Prices are comparable, but for the iPod touch user, it's still not clear whether any GPS software will interface with an external Bluetooth GPS (which was the norm for the Palm). It seems like the only official way of doing it is to pay TomTom $200 ($100 for the GPS which interfaces to the iPod, and $100 for the software).
Advantage: Palm*

Function: Offline web page reading
Palm program: Plucker
iPhone equivalent: Instapaper
Comments: Plucker is free. Equivalent Instapaper functionality costs a few bucks. Also, since I had built my own plucking script, it was convenient to pluck both web pages and HTML and text pages on my computer. Instapaper only works for things that are already online. But Instapaper allows me to mark things for offline reading from any computer. (Actually Instapaper does give you an e-mail address that you can send an e-mail to, and the contents will show up in your list of things to read. However, the Instapaper interface shifted a little after I wrote this review. The new version of the app wasn't quite as fast, and the main new features seemed to be sharing/"liking"-oriented. Ultimately, in mid-2012, the app just unexpectedly stopped synching new content. I tried to redownload everything, but in the process I lost all articles that I had already downloaded (never to be recovered). The reason? The Instapaper servers had been modified to work with a still newer version of Instapaper, but this version would not run on my (admittedly older) iPod touch. That is, the developer broke my version of the app because he decided that he only wanted to support iOS 5 (which was being used by the vast majority of users). [This should perhaps not surprise me as this is also the developer who made all of his users change their usernames from short names that they had chosen to e-mail addresses.] So the alternative functioning app that I found that fulfills the same function is Readability. It loads terribly slowly and timed out on synching one huge text file, but it is otherwise fine. End of rant. )
Advantage: Palm


PDF reading is definitely superior on the iPod touch (I recommend buying GoodReader), but for some reason it's not convenient enough that I have gotten into the habit of reading PDFs on it.

Address book functionality is adequate on both.

Overall, in terms of PDA functions, it's kind of a draw. I've now owned the iPod touch for six months. I have my Address Book information on it. I use it regularly for reading web pages downloaded through Instapaper.

The killer feature of the iPod touch is the iPod features themselves. I use it for listening to podcasts and music all the time. The convenient and automatic syncing of stuff to the iPod and its intuitive interface are the key features that make it so much better than any competing products that I have played with. The Palm was never even close to a decent mp3 player. Combined with the iPod touch's better battery life (the Palm is a color model), this is the reason that I stopped carrying the Palm in favor of the iPod touch.

I sync text files over to the iPod touch using Simplenote, but it has to be done manually. (I could easily automate syncing from the computer to the Simplenote web site, but in order to sync text files with the iPod touch, one must run the Simplenote program while the iPod has an Internet connection.) With the Palm, I had it set up so that everything would sync pretty much automatically. Instapaper has the same syncing problem. One advantage to Simplenote though is that the syncing works in both directions without any contemplation on my part so I can take notes on the iPod touch and sync them back to my computer as well as going the other way (whereas on my Palm, I pretty much just synced from the Palm to my computer). In truth, I am not using the text files feature so much on the iPod.

I'm starting to experience gadgetlust again. I want to have something that I can carry around that I can modify... like something that could run Perl or Python scripts. This is forbidden on the iPod touch. A cheap Android device looks slightly appealing. Unfortunately the unlocked phones are still quite expensive.

There are a lot of cool programs for the iPod touch (the Dropbox client, Google Earth, Pandora). Basically, for Internet functionality the iPod touch blows away the Palm, but for the few things I want to do on a regular basis (other than listen to audio files) - read text files, read downloaded web pages, look at offline maps, turn-by-turn navigation - the Palm is actually still as good or better. Maybe this is because I acquired this functionality over years of fiddling with things and finding the right software. Presumably I will find (and pay for) better software for some functions on the iPod touch, but whether I will be able to tinker as much and optimize some otherwise suboptimal features, remains to be seen.


* UPDATE: The new Mapquest iPhone app may improve map capabilities for the iPhone Touch significantly. (Back to the footnote pointer)

ANOTHER UPDATE: It turns out that it is possible to use HTML5 and Javascript to program "web apps" which are almost as good as a regular iPhone App (mainly lacking the ability to access iPhone sensors). It is possible to save these apps to your device and use them even when offline. This may counteract some of my above criticisms of the iPhone.

UPDATE 3: A collection of iPhone web apps, with reviews, can be found at OpenAppMkt. However, it's unclear how well they work. I tried PieGuy (a Pac-Man-like game) on my second generation iPod touch, and it didn't work at all. I also recall Checklist as being wonky, behaving OK sometimes, but then losing its state at other times. Your mileage may vary.
UPDATE 4: This post is now largely of historical value as the abilities and variety of iOS apps have continued to improve and workarounds have been devised to address problems. (Though it is still as of 2012 necessary to actually open the SimpleNote app while the device has an Internet connection in order to make it sync.)

Friday, February 19, 2010

The One Book List

Back in the early days of the World Wide Web, there was a thing called "The One Book List". It was started in 1994 by Paul Phillips when he posted to the rec.arts.books newsgroup:
I would like for each of you to decide on a single book that you would most like for the world to read for inclusion in the list. The book that, for you, was the most influential, or thought-provoking, or enjoyable, or moving, or philosophically powerful, or deep in some sense you cannot properly define, or any other criteria you wish to set.
Hundreds of people responded to his request, and he eventually accumulated a set of stellar recommendations from people all over the proto-Web, including Douglas Adams, who recommended The Blind Watchmaker by Richard Dawkins, and Douglas Hofstadter, who recommended The Catcher in the Rye.

I often used to browse through the One Book List Replay when looking for a great book. It was from this list that I found Replay by Ken Grimwood. It takes a novel concept involving replaying events, explores all the permutations, and ultimately comments on the role of human decisions in life. Replay is a thought-provoking and entertaining book that has stuck with me ever since I read it.

When I looked for the list again recently, I found that it had essentially disappeared from the Internet. The only place you can find it now is through the Wayback Machine which has a copy of the original front page of the One Book List site (last updated in 1998) and the the list itself. If you are looking for a great reading experience, the One Book List is still a great place to start your search.