Posts Tagged ‘Python’

Unworkable 0.4 released

Monday, January 7th, 2008

I have just tagged, packaged and announced version 0.4 of my BitTorrent implementation, Unworkable.

Here are the release notes:

  • Implemented sending peer keep-alives.
  • Trace log now contains timestamps.
  • Make us more tolerant of intermittent tracker failures.
  • Added support for Arch Linux.
  • Fixed an off-by-four bug which could cause segfaults on some platforms.
  • Fix zero padding in peer id generation.
  • Overall code reduction and re-factoring plus improvements to documentation.

Decoupled Python GUI Construction, or BitTorrent visualisation

Saturday, December 22nd, 2007

While in general I appreciate very simple, no-nonsense user interfaces for applications that work efficiently on the console and so can be used via SSH, there are times when increased visualisation is very useful.

Specifically with regard to my BitTorrent client, Unworkable, the default user interface is exceedingly simple. Inspired by the ubiquitous scp program on UNIX, the idea is that it should be just as simple to download a file via BitTorrent as via SSH or FTP/HTTP. Indeed, I borrowed the progressmeter.c source code from OpenSSH (a great repository of nice code). Have a look at this screenshot of Unworkable running on Windows to get an idea of what it looks like.

While I’m pretty happy with the current console user interface, it is obviously very limited in how much information it can display. To overcome this, I have for a long time had an extremely verbose trace mechanism. Pass the -t option to Unworkable, and it will write a detailed log. This is incredibly useful for debugging, but it suffers from the opposite problem - information overload. Its quite difficult for a human to parse scrolling pages of ASCII text and spot patterns. It can take considerable analysis to determine exactly why the program is making the decisions it is. This is the main motivation for adding support for a graphical user interface - information can be much more easily distilled into graphs and other visualisations which make it much easier to understand, at a glance, what is going on. For some examples of the kind of things I have in mind, take a look at the Azureus screenshots page.

This brings me to the question of how to nicely add a GUI to my application. I strongly want to keep the existing simple, console interface, and don’t want to bloat the application with additional external dependencies for widget sets and so forth. I decided to go with an IPC mechanism, to split the GUI from Unworkable entirely. While considering IPC mechanisms, I figured why not simply use TCP/IP. While in general the GUI is going to be running on the same host as Unworkable, this at least gives the option for operation over a network.

So, I have added a simple “control server” to Unworkable, which currently has a fairly basic ASCII protocol. At the moment, communication is unidirectional - Unworkable only pushes out some event notification messages. There is no mechanism for clients to send instructions back, since this isn’t required for visualisation just yet and that is my first priority. I have started a client implementation in Python. “Why Python?”, you may ask. Python has pretty good networking support, good UI toolkit support and good multi-platform support. I’m also pretty experienced with the language, and I find it very fast to write applications with. Since the GUI itself performs practically no hard computation nor I/O, the performance penalties of a higher-level language are hardly a concern.

In closing, the formula of having the application split into a C program which does the performance-sensitive stuff, exposing a simple ASCII protocol over TCP/IP, while implementing the user interface in Python, permits maintaining a lean and efficient core application with a slick graphical user interface.

Unworkable 0.3 released

Thursday, December 20th, 2007

I have just tagged, packaged and announced version 0.3 of my BitTorrent implementation, Unworkable. My goal with Unworkable is to make releases frequently - hopefully twice a month or so - with incremental improvements each release. The hope is that each release should be of a higher quality than the last. Therefore I try to test new features well and ensure the stability is at least as good as the previous release. I also try to run tests across a wide variety of platforms (Solaris, OpenBSD, Linux, Windows, Mac OS X, etc).

Anyway, here’s whats new in this version:

  • Fixed a subtle bug in download strategy
  • Removed numerous format specifier bugs by bringing source in line with C99.
  • Major refactoring and code cleanup.
  • Added initial implementation of a TCP/IP “control server”
  • Checked in some initial work towards a decoupled Python UI.
  • Portability improvements to build and run on Windows (Cygwin).
  • Build and runtime testing on Fedora 7 and Gentoo Linux.

cvs log output parser, find N most recent commits, in Python

Tuesday, November 20th, 2007

Over the weekend I decided to spend some time trying to improve the web page for my BitTorrent project. I felt that there is a good bit going on with the project, but that that web page was not reflecting this to people who might be visiting it. I already write CVS commit log messages, I don’t feel like also writing those things into the HTML page. Why not simply display the latest N commit logs on the web page? This informs the user at a glance a) how active the project is b) what I’ve been working on.

Probably the traditional approach to doing this kind of thing is using CVS ‘commit hooks’. These are scripts run on every check in that can do things with the log messages and so on. For example, the OpenBSD ’source-changes’ mailing list is run using this feature. I didn’t really want to bother with this though - not least because the web server is not the same machine as the CVS server. Also, the commit hook approach only works from the point it has been set up onwards. I wanted prior changes to be visible.

The output of ‘cvs log’ contains all the relevant information. The cvs log command doesn’t require access to the CVSROOT, indeed I can do it quite easily read-only over SSH, since cvs.unworkable.org uses anoncvs - thus it is highly secure, and distributed. All I needed to do was to write a program to parse the output of ‘cvs log’, order it by timestamp (latest first ordering), and display the first N entries. It turns out to be pretty easy to do this in Python. My first question was, what should the data structures be? I reasoned that each commit could be represented by a dictionary with a few keys - timestamp, file, and commit log message. So all we needed was a list of these dictionaries - easy! After this, I saw two challenges - the parser and the sorter. The parser is a fairly simple finite state machine. String handling is pretty nice in Python - at least compared to C. It did not take too much work to get it pulling out the data I needed - great. Now to order the list. It is not hard to grasp what we need to do - we want something to compare each dict in the list based on its timestamp. But how exactly do we get Python to do this for us? We could write our own sort routine, but that shouldn’t really be necessary. Python already has a perfectly good list.sort() method. The question thus becomes, how do we make the sort() API do what we want? Python’s internal API docs reveal:

sort(...)
    L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
        cmp(x, y) -> -1, 0, 1

This tells us very little beyond a few named parameters. I can guess the meanings of ‘reverse’ and ‘cmp’ roughly, but there is no info at all on ‘key’. Sparse documentation like this is one of my pet peeves about Python. Fortunately I was able to figure it out. ‘key’ expects a function to be passed in, which will be called on each list entry and should return a value to sort by. Aha! All we need is to write a function to return the timestamp from each list entry. Oh wait - it turns out we don’t. Python’s standard library has a module called ‘operator‘ which includes a pre-written function “itemgettr” to do exactly this! So the entire ’sorting’ part of the problem can be achieved in a single line: entries.sort(key=operator.itemgetter(’dt’), reverse=True)

Having finished the program, it simply reads from stdin and writes to stdout. It defaults to printing the most recent three entries, but this is configurable through a single command-line argument. For example, cvs log unworkable | python changes.py -n 1 currently prints out:

network.c
revision 1.180
date: 2007/11/20 04:44:07;  author: niallo;  state: Exp;  lines: +3 -2
guard against a NULL piece_dl in the PIECE message handler.  this needs to be re-examined,
but it should at least stop us segfaulting.

which is exactly what I want!

Anyway, you can download the Python program (which I’ve BSD-licensed) here. Feel free to post any comments or feedback.

Python Fun: SILC TinyURL chat bot

Thursday, August 16th, 2007

A couple of close friends and I use a private SILC chat-room to communicate. I use the console silc client, running under GNU screen. Very often we trade links to various web pages. Very often these are too long to fit in the console window and become a pain to select in order to paste into a browser or whatever. Sometimes someone is connected via their cell-phone or other very small device, and long links are utterly impractical. We try to use tinyurl.com whenever possible to shorten links, however its still a bit of a pain. You have to go to tinyurl.com, paste in the link, submit, wait, copy out the result, and re-paste. Kind of an annoying extra bunch of steps. Shouldn’t computers really be able to do this repetitive work for us?

I decided to write a chat-room bot which would simply listen for HTTP URLs, submit those URLs to tinyurl.com, scrape the result and announce it in the channel. This project really has three pieces: 1) SILC client implementation 2) identifying URLs from chatter 3) HTTP client interface to tinyurl.com. I’ve used Python before to great effect for things like text processing and HTTP requests. I noticed there was a package called ‘py-silc’ installed on my system, so I decided to try to use it for the SILC side.

I found py-silc very straight forward to use. The example on the website is a little sparse, but gives enough of an example to start with. The pydoc documentation was also more than sufficient for my very trivial requirements. All I needed was some way to connect to the server, hook channel messages up to a regular expression parser, and to “say” something in the channel. Py-silc allowed me to do this in a very simple fashion. You just need to sub-class the supplied silc.SilcClient class, and supply a few callbacks. Very easy.

To extract HTTP URLs from the channel messages, I needed a regular expression parser. Rather than writing my own, I took the easy route and found a Python port of some Perl regexp example. It works great for my needs!

Finally, the HTTP side. Python actually has a few different APIs for fetching HTTP URLs. There is urllib, urllib2 and httplib. This reveals one of the mildly annoying things about Python, which is the standard library is somewhat ad-hoc. However, its still nice and easy to do what I want. Looking at tinyurl.com’s HTML source, you’ll notice they HTTP POST the input URL as field ‘url’ to ‘create.php’. Its a few lines of Python to encode the request, send it off, and parse the response. The response handling is implemented simply as reading from a file-like object in Python which in turn means we end up treating it as a string. The only tricky thing in parsing the response is how to decide which URL is the final tinyurl version of our original URL. For simplicity I went with the somewhat dirty method of counting up the preceding URLs. Currently, the URL we want is the fourth in the page which starts with the string “http://tinyurl.com”. This works, but is of course prone to breaking if tinyurl.com change their HTML. A much better approach would be to properly parse the HTML document, but I couldn’t be bothered doing that at this point.

It took me about twenty minutes to hack up a working client which did the job - Python is just great for hacking up stuff like this fast - and as usual an hour or two to polish it up to the point where it forks into the background, accepts command line parameters, etc. The tarball is here for those interested. I’ve licensed my portion (tiny.py) under a BSD license.

Monte Carlo simulation in Python #1

Thursday, July 5th, 2007

I became interested in Monte Carlo simulation after reading Fooled By Randomness, the author of which makes numerous references to the power of these simulators. One of the first things I learned was that “Monte Carlo methods” is a term covering pretty much any use of pseudo-randomness to help solve any kind of problem. Apparently, Monte Carlo is an old name for what is now commonly known as a roulette wheel, hence the relation to randomness.

One of the fascinating examples of a Monte Carlo simulator described in Fooled by Randomness is the use of pseudo-random numbers to calculate an approximation of the value of Pi. Imagine a dart board inside of a square. If you throw darts, in a random fashion, at the square, counting the number of darts which land on the dart board versus the number which land on the square, you can approximate Pi with some simple arithmetic.

After reading about this, I simply had to write a program to work it out. I figured Python would be a good language in which to hack it out. Indeed, it turns out that Python comes with a good pseudo-random number module in its standard library. Here is the code for my simple Pi approximator, which throws 1,000,000 virtual darts:


from random import random
from math import pow, sqrt

DARTS=1000000
hits = 0
throws = 0
for i in range (1, DARTS):
	throws += 1
	x = random()
	y = random()
	dist = sqrt(pow(x, 2) + pow(y, 2))
	if dist <= 1.0:
		hits = hits + 1.0

# hits / throws = 1/4 Pi
pi = 4 * (hits / throws)

print "pi = %s" %(pi)

And a sample run (timed on a 2.0Ghz Pentium M):


$ time python monte-carlo-pi.py
pi = 3.1422991423
    0m3.89s real 0m3.78s user 0m0.03s system

I have done some other hacking using Monte Carlo methods, specifically exploring methods of stock price prediction, which I hope to write about in the future.

Word Press post from shell

Thursday, June 28th, 2007

One of the most annoying things about this whole weblog business is having to write your posts in the browser. I can’t stand entering any more than a few sentences in a web browser text area. Not only because web browsers are so unstable, prone to crashing, or that its so easy to press the wrong key and kill the window or go back in the history and lose what you’ve written - but that you are forced to use the crummy browser text field component. I use Vi for most of my editing these days, and I have grown very fond of the movement keys (hjlk). Using these means I don’t have to move my hands from the keyboard to move the cursor, which reduces strain and increases speed. Of course I miss many other features, such as on-the-fly spell-checking (although Firefox 2+ has this), word completion, auto-save, auto-format, syntax highlighting, and bracket matching also.

I initially assumed that WordPress is too poorly thought-out to have something clever like an XML-RPC interface. Well, thankfully, I was wrong. WordPress in fact supports the Blogger, Moveable Type and a bunch of other XML-RPC interfaces. This meant I wasn’t going to have to screen-scrape and hack together some horrible and brittle HTTP client.

I do a fair bit of work in Python for my job, and I’ve come to rather like the language for whipping together smallish programs. Especially tasks involving processing files or URLS, Python particularly excels for these. While it would not be a big deal to use the Python XML-RPC library directly myself (its in the Python standard library since version 2.2), I figured I would use someone else’s code if at all possible. After a bit of searching, I found wordpresslib. This little .py file is available under the LGPL license and makes it trivial to perform a variety of Word Press operations from Python.

My rather trivial program simply allows you to submit a post in draft or published form from the shell. It accepts input either on STDIN or a specified file. You can also set a post title. For the hell of it, I’ve put it under a BSD license and packaged it up in a little tarball with wordress.py available for download here. Before using it, you need to set three variables in main.py - the URL of your Word Press install, your user and your password. Sample run:


$ echo this is a post | ./main.py -t 'a test post'

If you are seeing this post, that is proof that it works. There are many more features which could be implemented of course. Maybe I’ll bother maybe I won’t. I’ll see if I get the itch! Feel free to send me patches of course ;-)