Note (4/2/11): Please see my recent post detailing asynchronous HTTP requests using Twisted.
Note (3/13/11): I originally wrote this post while looking for callback-style HTTP request functionality in python. I made the mistake of thinking that "callback-style" is the same as "asynchronous". The following details my efforts to achieve a callback-style HTTP request using urllib2. The final (updated) code example illustrates how to use threads to achieve asynchronicity. I'd recommend using a thread pool if you plan more than just a handful of requests. And, as others have noted, Twisted is really the best python framework for asynchronous programming. Also, I'd like to thank the commenters for pointing out my mistakes; I'm sorry for not realizing my errors sooner.
You might think it would be easy to write python code to
perform an asynchronous achieve a callback-style web request. It ought to be as simple as providing a url and callback function to some python library routine, no? Well, technically, it is that simple. But somehow, the documentation makes the task surprisingly difficult.
One option, of course, is Twisted. But, reading through the (sparse, fractured) documentation made me think there had to be something easier. This led me to urllib2. The short answer is that, yes, urllib2 does what I want. But, the documentation is sufficiently backwards that it took me over an hour to figure out how to accomplish the task.
blocking simple HTTP request with urllib2 is simple and the documentation reflects that: use openurl. The return value of openurl provides the response and additional information in a file-like object. The problem is how to achieve the same result in an asynchronous callback-style manner. One would think openurl could simply take an additional handler object which is called with the response as its only argument when the request completes. Ha! build_opener looked vaguely promising as it accepted handler(s). This led me to create a class which inherited from BaseHandler which defined protocol_response. No dice. And, as I later realized, protocol_response takes three arguments (self, req, response), not two, and changes names depending on the protocol. Of course, at that point, I was at a loss as to how the protocol name was determined (the BaseHandler documentation ignored this issue). And, the examples were useless since they all used standard handlers. Next, I tried inheriting from HTTPHandler, overriding http_response with a method that simply prints the url, info and response text. This almost worked. It successfully retrieved the web page and printed it. But, then, it raised the following exception:
Traceback (most recent call last): File "./webtest.py", line 14, inAfter much searching, I finally realized that I had failed to return a response-like object from my http_response method. This seems like an odd requirement for a callback method. And, it could have been easily clarified in the documentation with an example.
o.open('http://www.google.com/') File "/usr/lib/python2.6/urllib2.py", line 389, in open response = meth(req, response) File "/usr/lib/python2.6/urllib2.py", line 496, in http_response code, msg, hdrs = response.code, response.msg, response.info() AttributeError: 'NoneType' object has no attribute 'code'
Alas, after all that, I was able to use urllib2 to successfully make an asynchronous HTTP request, so I can't complain too much. Here's the code for anyone who's interested:
#!/usr/bin/env python import urllib2 import threading class MyHandler(urllib2.HTTPHandler): def http_response(self, req, response): print "url: %s" % (response.geturl(),) print "info: %s" % (response.info(),) for l in response: print l return response o = urllib2.build_opener(MyHandler()) t = threading.Thread(target=o.open, args=('http://www.google.com/',)) t.start() print "I'm asynchronous!"
Update (3/12/11): My comment before the sample code indicated that the sample code was asynchronous. But, it wasn't. I've updated it to be asynchronous. When originally writing this post, I intended the example code to show the urllib2 handler approach.