Posts Tagged ‘python exceptions’

Python and poor documentation - urllib2.urlopen() exception layering problems

Saturday, April 5th, 2008

UPDATE: An informative email thread which I started on the Python BayPiggies email list, can be found here.

One of the things I like most about Python is the “batteries included” philosophy. The standard library is comprehensive. One of the things which I dislike most about Python is the documentation. While superficially comprehensive, it leaves out many important details.

A case in point is the urlopen() function from the urllib2 module. This is very useful for fetching HTTP resources and allowing you to process the results trivially. Unfortunately it has some strange behaviours. The documentation claims “Raises URLError on errors”. One would imagine that this means one simply needs to catch the URLError exception, to successfully handle errors. This is incorrect. I have written a number of Python programs which are long-lived in nature (run indefinitely - over months) which use this function over the Internet. I want these programs to be robust and to recover from individual urlopen() failures. In the course of running these programs I have found the following exceptions are in fact thrown (note that I operate only on HTTP URLs):

  • urllib2.HTTPError
  • urllib2.URLError
  • httplib.BadStatusLine
  • httplib.InvalidURL
  • ValueError
  • IOError

Of course none of these are mentioned in the documentation. It is also unclear if it is the intention of the API designers that these exceptions should bubble up to the urlopen() caller. For example, certain httplib errors are caught by urllib2 and raised as a URLError, but obviously many are not. In my opinion, the documentation should either be clear that the caller needs to check for various underlying library exceptions, or urllib2 should be modified to convert all these errors into URLError.

To this effect, I intend to bring this issue up for discussion on some Python lists and if consensus is reached, I will propose a patch to handle at least those httplib exceptions which I have encountered.