Python Fun: SILC TinyURL chat bot
A couple of close friends and I use a private SILC chat-room to communicate. I use the console silc client, running under GNU screen. Very often we trade links to various web pages. Very often these are too long to fit in the console window and become a pain to select in order to paste into a browser or whatever. Sometimes someone is connected via their cell-phone or other very small device, and long links are utterly impractical. We try to use tinyurl.com whenever possible to shorten links, however its still a bit of a pain. You have to go to tinyurl.com, paste in the link, submit, wait, copy out the result, and re-paste. Kind of an annoying extra bunch of steps. Shouldn’t computers really be able to do this repetitive work for us?
I decided to write a chat-room bot which would simply listen for HTTP URLs, submit those URLs to tinyurl.com, scrape the result and announce it in the channel. This project really has three pieces: 1) SILC client implementation 2) identifying URLs from chatter 3) HTTP client interface to tinyurl.com. I’ve used Python before to great effect for things like text processing and HTTP requests. I noticed there was a package called ‘py-silc’ installed on my system, so I decided to try to use it for the SILC side.
I found py-silc very straight forward to use. The example on the website is a little sparse, but gives enough of an example to start with. The pydoc documentation was also more than sufficient for my very trivial requirements. All I needed was some way to connect to the server, hook channel messages up to a regular expression parser, and to “say” something in the channel. Py-silc allowed me to do this in a very simple fashion. You just need to sub-class the supplied silc.SilcClient class, and supply a few callbacks. Very easy.
To extract HTTP URLs from the channel messages, I needed a regular expression parser. Rather than writing my own, I took the easy route and found a Python port of some Perl regexp example. It works great for my needs!
Finally, the HTTP side. Python actually has a few different APIs for fetching HTTP URLs. There is urllib, urllib2 and httplib. This reveals one of the mildly annoying things about Python, which is the standard library is somewhat ad-hoc. However, its still nice and easy to do what I want. Looking at tinyurl.com’s HTML source, you’ll notice they HTTP POST the input URL as field ‘url’ to ‘create.php’. Its a few lines of Python to encode the request, send it off, and parse the response. The response handling is implemented simply as reading from a file-like object in Python which in turn means we end up treating it as a string. The only tricky thing in parsing the response is how to decide which URL is the final tinyurl version of our original URL. For simplicity I went with the somewhat dirty method of counting up the preceding URLs. Currently, the URL we want is the fourth in the page which starts with the string “http://tinyurl.com”. This works, but is of course prone to breaking if tinyurl.com change their HTML. A much better approach would be to properly parse the HTML document, but I couldn’t be bothered doing that at this point.
It took me about twenty minutes to hack up a working client which did the job – Python is just great for hacking up stuff like this fast – and as usual an hour or two to polish it up to the point where it forks into the background, accepts command line parameters, etc. The tarball is here for those interested. I’ve licensed my portion (tiny.py) under a BSD license.







the url is dead
hi unfortunately the link to you tarball isn’t anymore working. due to the lack of actual silc bots, it might be nice to have some examples… would you mind to publish it somewhere?
thanks
I accidentally moved the tarball away. Its back now, sorry about that.
tinyurl have an api e.g. http://tinyurl.com/api-create.php?url=http://google.com