Archive for the ‘BitTorrent’ Category

Top 10 Torrented Films of 2007

Tuesday, March 18th, 2008

Over at the Peer to Peer Research Institute, we have performed analysis on over 128,959 film torrents. From this data, we were able to extract the most torrented films of 2007. Without futher ado, here is the list:

  1. Harry Potter and the Order of the Phoenix
  2. 300
  3. Transformers
  4. Ghost Rider
  5. Ratatouille
  6. Shrek The Third
  7. The Simpsons Movie
  8. Sunshine
  9. I am Legend
  10. I now pronounce you Chuck and Larry

You can expect more interesting data extracted from the dynamic world of P2P file sharing on this blog in the future.

Unworkable 0.4 released

Monday, January 7th, 2008

I have just tagged, packaged and announced version 0.4 of my BitTorrent implementation, Unworkable.

Here are the release notes:

  • Implemented sending peer keep-alives.
  • Trace log now contains timestamps.
  • Make us more tolerant of intermittent tracker failures.
  • Added support for Arch Linux.
  • Fixed an off-by-four bug which could cause segfaults on some platforms.
  • Fix zero padding in peer id generation.
  • Overall code reduction and re-factoring plus improvements to documentation.

BitTorrent Strategies: The Beginning

Thursday, December 27th, 2007

To follow up on my last post on the bittorrent end-game, I’m going to write about a strategy to bootstrap a torrent download. I am talking here about the case where you start a download with no existing data, in other words, from scratch. As I described in one of my earlier articles about the basics of the BitTorrent protocol, torrents are divided into units called pieces. The torrent meta data contains the SHA-1 checksums for each piece, so that we can hash each piece once we have downloaded it, for verification purposes. Pieces are downloaded on a block-by-block basis - typically the block size is sixteen kilobytes (16KB).

Another important thing to understand is that BitTorrent peers use a “tit-for-tat” transfer scheduling algorithm. That is, peers will be rewarded by other peers for uploading data to them. It is therefore important that a bootstrapping peer have at least a small number of complete pieces as soon as possible, in order to share them, and hence be rewarded by the “tit-for-tat” algorithm. The bootstrapping peer wants to get at least a few pieces to share as soon as possible.

Additionally, pieces are usually downloaded in rarest-first order. This ensures that rare pieces are replicated sufficiently so that holes do not appear in the torrent as peers leave the swarm. One of the design considerations of BitTorrent - vs other distributed data mechanisms - is to have as reliable replication as possible. It is highly undesirable in BitTorrent for even a single piece to disappear, as this could render the final file(s) completely unusable. This contrasts heavily with a file distribution system such as Amazon’s Dynamo, where it may be quite acceptable for data to disappear - is it really a huge deal if a user loses the contents of their shopping cart in an outage?

In any case, the desire to ensure replication of the rarest pieces in a torrent is overridden by the need to get a small number of pieces as a peer is bootstrapping. To this end, the initial pieces are downloaded in random order as opposed to rarest-first order. In practice this means the bootstrapping peer should get its initial data quite a bit faster. The number of pieces to download before switching to rarest-first order is suggested to be four.

As a general point, the peer will aim to complete a piece fully before requesting blocks belonging to another piece. Alternately put, if the peer has a few blocks in one piece, it will concentrate on downloading the remaining blocks belonging to that piece before starting on another piece.

BitTorrent Strategies: The End Game

Wednesday, December 26th, 2007

Downloads in BitTorrent take place according to a number of strategies, which map to stages. Initiating a torrent download has one strategy, normal operation has another strategy, and finally pulling down the last remaining pieces has yet another strategy.

The End Game is the name for the final download strategy - there is a tendency for the last few pieces of a torrent to download quite slowly. To avoid this, many BitTorrent implementations issue requests for the same remaining blocks to all its peers. When a block comes in from one peer, you send CANCEL messages to all the other peers requested from, in order to save bandwidth. Its cheaper to send a CANCEL message than to receive the full block and just discard it.

However, there is no formal definition of when to enter End Game Mode. In my BitTorrent implementation, Unworkable, I quite stupidly used a percentage-based threshold. When only 5% of the torrent download was incomplete, Unworkable would enter End Game Mode. I didn’t notice how stupid this was until I got around to testing with a large torrent download, a Fedora Core 8 DVD image which is 4G in size or so. It turns out that 5% of 4G is quite a bit of data, and requesting each bit of it from every single peer is extremely inefficient. So I did some more research to find a more effective way to detect when to enter End Game.

I found two popular definitions:

  1. All blocks have been requested
  2. Number of blocks in transit is greater than number of blocks left, and no more than 20

I didn’t understand the choice of the number 20 in method 2, so I decided to go with option 1), which I did understand. In Unworkable CVS HEAD (and in the 0.4 release, coming in a week or two), the ill-informed percentage-based threshold is gone and End Game Mode is entered when all blocks are requested.

BitTorrent Distributed Hash Table (DHT) or Trackerless BitTorrent I

Monday, December 24th, 2007

One of the more interesting extensions to the BitTorrent protocol has been the introduction of a distributed hash table implementation. As mentioned in my previous article on the basics of the BitTorrent protocol, traditionally BitTorrent relies upon a centralised “tracker” application - which runs over standard HTTP - in order to facilitate contacting peers and so on. The requirement for a centralised tracker is obviously a major barrier to fully decentralised operation, and a problem in terms of BitTorrent’s resistance to tracker outage (perhaps even caused by legal actions). In part one of this article I’m going to look a bit at the network side of BitTorrent’s DHT.

The official BitTorrent DHT specification states that the protocol is based on Kademilia. In BitTorrent, DHT is mostly separated from the original protocol. Peers listen on an additional port, using a UDP protocol, to issue network searches and so forth. The DHT protocol is known as KRPC and consists of three message types - query (”q”), response (”r”) and error (”e”). There are four queries:

  • PING
  • FIND_NODE
  • GET_PEERS
  • ANNOUNCE_PEER

Each KRPC message is formatted in B-Encode, with various key-value pairs encoded in a dictionary structure. Each node participating in the DHT has its own routing table containing contact information for nodes “near” to it (according to a mathematical ‘distance function’). This routing table is used to produce responses to queries. We will go more into details of how this works in the next article.

The TCP BitTorrent peer protocol itself has only been very slightly extended in order to take advantage of this new DHT functionality. During the handshake, peers may indicate that they support DHT. If the receiver also supports DHT, it should send a new message called PORT (id is 0×09) with the port number of its UDP DHT service, encoded as a 16-bit value. The receiver of this message will then try to send a PING message to the peer on this port, and if it gets a reply, routing table adjustments etc should take place.

Those are the very basics of BitTorrent DHT from a network perspective. I will write more about the details of the algorithms used for routing and how the consistent hashing is employed in part two.

Decoupled Python GUI Construction, or BitTorrent visualisation

Saturday, December 22nd, 2007

While in general I appreciate very simple, no-nonsense user interfaces for applications that work efficiently on the console and so can be used via SSH, there are times when increased visualisation is very useful.

Specifically with regard to my BitTorrent client, Unworkable, the default user interface is exceedingly simple. Inspired by the ubiquitous scp program on UNIX, the idea is that it should be just as simple to download a file via BitTorrent as via SSH or FTP/HTTP. Indeed, I borrowed the progressmeter.c source code from OpenSSH (a great repository of nice code). Have a look at this screenshot of Unworkable running on Windows to get an idea of what it looks like.

While I’m pretty happy with the current console user interface, it is obviously very limited in how much information it can display. To overcome this, I have for a long time had an extremely verbose trace mechanism. Pass the -t option to Unworkable, and it will write a detailed log. This is incredibly useful for debugging, but it suffers from the opposite problem - information overload. Its quite difficult for a human to parse scrolling pages of ASCII text and spot patterns. It can take considerable analysis to determine exactly why the program is making the decisions it is. This is the main motivation for adding support for a graphical user interface - information can be much more easily distilled into graphs and other visualisations which make it much easier to understand, at a glance, what is going on. For some examples of the kind of things I have in mind, take a look at the Azureus screenshots page.

This brings me to the question of how to nicely add a GUI to my application. I strongly want to keep the existing simple, console interface, and don’t want to bloat the application with additional external dependencies for widget sets and so forth. I decided to go with an IPC mechanism, to split the GUI from Unworkable entirely. While considering IPC mechanisms, I figured why not simply use TCP/IP. While in general the GUI is going to be running on the same host as Unworkable, this at least gives the option for operation over a network.

So, I have added a simple “control server” to Unworkable, which currently has a fairly basic ASCII protocol. At the moment, communication is unidirectional - Unworkable only pushes out some event notification messages. There is no mechanism for clients to send instructions back, since this isn’t required for visualisation just yet and that is my first priority. I have started a client implementation in Python. “Why Python?”, you may ask. Python has pretty good networking support, good UI toolkit support and good multi-platform support. I’m also pretty experienced with the language, and I find it very fast to write applications with. Since the GUI itself performs practically no hard computation nor I/O, the performance penalties of a higher-level language are hardly a concern.

In closing, the formula of having the application split into a C program which does the performance-sensitive stuff, exposing a simple ASCII protocol over TCP/IP, while implementing the user interface in Python, permits maintaining a lean and efficient core application with a slick graphical user interface.

Unworkable 0.3 released

Thursday, December 20th, 2007

I have just tagged, packaged and announced version 0.3 of my BitTorrent implementation, Unworkable. My goal with Unworkable is to make releases frequently - hopefully twice a month or so - with incremental improvements each release. The hope is that each release should be of a higher quality than the last. Therefore I try to test new features well and ensure the stability is at least as good as the previous release. I also try to run tests across a wide variety of platforms (Solaris, OpenBSD, Linux, Windows, Mac OS X, etc).

Anyway, here’s whats new in this version:

  • Fixed a subtle bug in download strategy
  • Removed numerous format specifier bugs by bringing source in line with C99.
  • Major refactoring and code cleanup.
  • Added initial implementation of a TCP/IP “control server”
  • Checked in some initial work towards a decoupled Python UI.
  • Portability improvements to build and run on Windows (Cygwin).
  • Build and runtime testing on Fedora 7 and Gentoo Linux.

Faster BitTorrent, or, SHA1 is slow

Sunday, October 28th, 2007

This weekend I have been concentrating on improving the performance of my BitTorrent implementation. I somewhat believe in the mantra “premature optimisation is the root of all evil”, or at least, I’m not too worried about making everything super fast the first time around. Asynchronous network programs are complicated enough to write that I expected off the bat to have to rewrite much of the code after having sketched it out. This process of hacking up an initial approach, pushing it as far as I could, figuring out a better way, then re-doing it the better way, has brought my project reasonable success. In any case, I was happy enough with the download process that I decided to spend some time making it faster.

In order to make something fast, it helps to understand what is making it slow. Efficient BitTorrent protocol handling requires keeping lots of “in-flight” data block requests around. Both queued to individual peers, or spread out across many peers. At any one time, there are many block requests which the implementation must track, especially in order to make decisions about what it should try to download next. While working on my initial sketch which used a single linear tail queue for all “in-flight” requests, I found there are two common ways I need to access these requests. Firstly, I often ask “give me the in-flight request for offset X of piece index Y”. This maps nicely onto a binary tree structure, searched by request index and offset. Currently, these properties are guaranteed to be unique - i.e. only one request for a given index/offset pair can exist at a time - but in the future this will not be guaranteed. Think especially of “the endgame”, where we may want to request the same block from many peers. To future proof against this, I have the tree actually point to a tail queue of requests. Currently I am using a red-black tree. The second way I often need to access “in-flight” requests is by peer. I need to know “what are all the in-flight requests for peer X”. This is quite a bit easier to manage. Each peer simply has its own tail queue of its associated requests, which is at most one hundred elements long, and so cheap to search linearly. Furthermore, when inspecting a peer’s requests, it is almost always to perform an operation - e.g. deallocation - on all of them, so the the linear data structure is perfectly appropriate.

I didn’t measure exactly how much of an improvement this data structure change made - perhaps very little, since the linear lists were never more than a thousand entries long - however I’m happier to use the ‘right’ data structures in general, and not have to worry about it. After some profiling, I found another pretty obvious consumer of resources - SHA1 hashing. I had not especially optimised unworkable’s handling of the PIECE message, and it was performing a SHA1 hash of a piece as each block for it came in - regardless of whether we had received all the other blocks or not. I added a simple conditional which only does the checksum when we at least believe we have received all the blocks in the piece. This drastically improved performance, since we are now doing on the order of 16x (but depending on piece/block sizes, of course) fewer SHA1 hash operations.

I have not looked closely at other BitTorrent implementations, but I believe I can answer with relative confidence the question “why is BitTorrent so resource hungry?”. The answer: SHA1 hashing is expensive. As an aside, I do not think that OpenBSD yet supports hardware-accelerated SHA1 features in the newer VIA CPUs. Such hardware crypto features would be nice to see in more chips, maybe. Also, I wonder if other BitTorrent implementation authors have come up with better data structures or storage schemes to keep track of the stuff I am keeping track of. I must take a look sometime.

BitTorrent and protocol analysis, instrumentation

Sunday, October 14th, 2007

I have been working, time permitting, on my BitTorrent implementation over the past couple of months. I’d like to note first of all that a fellow
BitTorrent hacker, the maintainer of LibBt, Ryan Walklin, contacted me about some of the issues I was writing posts about. Unfortunately I don’t think I was of much help to him with his MSE/OpenSSL problems, but it was a very useful contact for me since Ryan took the time to explain some of the downloading algorithms used in LibBT.

It was great to get some answers to some of my questions, thanks for that Ryan! The discussion made me all the more aware of the limitations in my design, and I have since been re-structuring much of the network code. This correspondence alone proves to me that its worth taking the time to write posts on some of the issues I’m working through. Today I finally got where I want to be in terms of keeping track of in-progress block transfers and peer connections and so forth, so I feel I’m making good progress. However for a long time now I’ve been wishing for some sort of analysis tool to see exactly what messages existing clients (which work!) are sending, and in what order. Just now I discovered an “instrumented” version of the mainline BitTorrent client, from the folks at INRIA. I’d been aware of their work in BitTorrent for some time, having read their papers - which are responsible for me using the rarest-first piece download strategy. Anyway, this patched client they have published is truly eye-opening. It shows exactly the messages the client is sending and receiving, along with timing and internal state information. Absolutely invaluable for seeing how a real world transfer should go! This information should greatly help me in making my implementation much more robust.

BitTorrent encryption, MSE basics

Sunday, July 29th, 2007

I have been working quite a bit recently on my BitTorrent implementation. My project is faring reasonably well so far - it can successfully download large real-world torrents at decent speed. In the course of development, I have been focusing solely on the core specification, as documented here. Having gotten support for peer connections - outgoing and incoming - working, I started to notice something. Some peers would send ostensibly random data to my client. Investigation of this led me to conclude these were peers using the Message Stream Encryption protocol.

I am not hugely interested in avoiding traffic shaping, but the encryption protocol looks like fun to implement and I’d like my software to be able to communicate with these funky peers (instead of just killing them, as it does now). I know very little about cryptography, but I know a little more now that I have begun implementing MSE. MSE really has two components - a Diffie-Hellman (D-H) key exchange to negotiate a secret, and RC4 stream cipher to encrypt the data. Both these functions are used in Secure Sockets Layer. However it should be pointed out that MSE is more about obfuscating the data to confuse traffic shapers than hiding its contents from prying eyes.

Fortunately for me, the OpenSSL library contains nice implementations of both D-H key exchange and RC4 . Before I can get to the nitty gritty of actually negotiating obfuscated streams, I had to make considerable changes to my software’s peer handling logic. I had written the peer response handler with quite hard assumptions about the nature of the incoming peer handshakes - for example, assuming the first byte would be the length of the protocol string. Furthermore I needed to add some timekeeping logic for each peer, because MSE handshake depends on timing for synchronisation. The initial peer handshake message is variably padded and so time must be considered to know when the message is complete. For example a MSE handshake from peer A may be considered invalid by peer B a) if A sent less than 96 Bytes within 30 seconds or b) if A sent more than 608 bytes.

My first step has been to re-work my peer handshake handling so that MSE handshakes can be distinguished and separately handled by the software, while traditional handshakes continue to work. Code to do this has been committed. The next step is to actually support crypto peers. For anyone curious about where the source code for my software is, I am not yet ready to release it. I have distributed it to a small group of testers to help iron out some of the bugs, but I don’t feel its ready for public release. I am being somewhat selfish and sitting on my personal project until I have polished it to where I want it to be. If you are really interested though, shoot me an email (niallo (*) niallo dot net).