Tuesday, June 8, 2010

Returning an exit status with Twisted

When I had a need for returning an exit status from a Twisted process, my first instinct was to look for a reactor.stop argument. In fact, there have been multiple requests for such, e.g. tickets #718 and #2182. But, then, I realized that reactor.stop doesn't stop the reactor, it merely initiates the shutdown process. The reactor is not shut down until reactor.run exits. This realization made it clear what I should do to return a specific exit code---simply add

    sys.exit(code)
immediately after reactor.run.

Monday, March 22, 2010

More ElementTree Annoyances

  • Cannot serialize int. I can see the value in not automatically serializing every possible object with a __str__ method. But, not converting an int? C'mon!
  • Cannot serilaize None. Wouldn't None be the perfect value to indicate "don't serialize this attribute"?
I'm generally a fail-fast-and-loudly kind of guy, but I also don't like having to write more code when it's obvious what I mean. These seem like two cases where I think the tradeoff is in favor of writing less code...

Examples:

>>> import xml.etree.ElementTree as et
>>> et.tostring(et.Element('Foo', attrib={ 'a': 1}))
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 1009, in tostring
    ElementTree(element).write(file, encoding)
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 663, in write
    self._write(file, self._root, encoding, {})
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 698, in _write
    _escape_attrib(v, encoding)))
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 830, in _escape_attrib
    _raise_serialization_error(text)
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 777, in _raise_serialization_error
    "cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize 1 (type int)
>>> et.tostring(et.Element('Foo', attrib={ 'a': None}))
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 1009, in tostring
    ElementTree(element).write(file, encoding)
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 663, in write
    self._write(file, self._root, encoding, {})
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 698, in _write
    _escape_attrib(v, encoding)))
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 830, in _escape_attrib
    _raise_serialization_error(text)
  File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 777, in _raise_serialization_error
    "cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize None (type NoneType)

Monday, February 8, 2010

__missing__

According to the python documentation:

If a subclass of dict defines a method __missing__(), if the key key is not present, the d[key] operation calls that method with the key key as argument. The d[key] operation then returns or raises whatever is returned or raised by the __missing__(key) call if the key is not present. No other operations or methods invoke __missing__(). If __missing__() is not defined, KeyError is raised. __missing__() must be a method; it cannot be an instance variable. For an example, see collections.defaultdict.
This is, at least, incomplete, since __missing__ must not only return the default value, but also assign it internally. This is made clear in the documentation for collections.defaultdict:
If default_factory is not None, it is called without arguments to provide a default value for the given key, this value is inserted in the dictionary for the key, and returned.
Surprisingly, the __missing__ method is not mentioned in the special method names section of the python documentation.

Thursday, February 4, 2010

collections.defaultdict

collections.defaultdict is nice, especially when counting things. But, defaultdict only lets you use zero-argument constructors. Pffft! Fortunately, it's easy to write a defaultdict which passes arguments to the constructor:

class defaultdict2(dict):
    def __init__(self, factory, factArgs=(), dictArgs=()):
        dict.__init__(self, *dictArgs)
        self.factory = factory
        self.factArgs = factArgs
    def __missing__(self, key):
        self[key] = self.factory(*self.factArgs)
        return self[key]

Update 2/8/10: added "return" line to __missing__ per discussion in this post on __missing__.

Wednesday, February 3, 2010

Kid Template Recompilation

I'm involved in a project which uses the TurboGears framework for serving web pages. The templating language we use is Kid. Recently, we ran into a problem where web pages did not correspond to the installed templates. After a bit of detective work, we suspected that TurboGears/Kid was not using the templates, but rather stale, compiled versions of old templates (.pyc files). Some Kid mailing list discussion confirmed our suspicions. The problem is that Kid only recompiles if the mtime of the source (.kid) file is after the mtime of the corresponding compiled (.pyc) file. In contrast, Python recompiles unless the mtime stored in the .pyc file exactly matches the mtime of the source (.py) file.

My understanding is that, ideally, Python would use a one-way hash of the source and only use the compiled file if there is an exact match. The exact mtime comparison is practically nearly as good and much, much faster. But, the mtime inequality comparison is a poor approximation of the ideal and only works when you can guarantee that (1) the system clock is perfect and never changes timezone (e.g. no switch between EDT and EST), and (2) mtimes are always updated to "now" whenever contents or locations are changed (i.e. even "mv" must affect mtime and rsync -a is right out). I don't know of any OS which provides these guarantees. The good news is that there is no disagreement on the existence of the problem; so, this is likely to be fixed in a future version of Kid.

Tuesday, January 19, 2010

numpy.dot

I should have known. numpy.dot doesn't work with sparse matrices. What's worse is that it happily accepts a sparse matrix as an argument and yields some convoluted array of sparse matrices. What I should be doing is x.dot(y) where x is a scipy.sparse.sparse.spmatrix and y is a numpy.ndarray.

Note that I'm using the Debian stable versions of these packages: numpy 1.1.0 and scipy 0.6.0.

Friday, January 8, 2010

urllib2.HTTPErrorProcessor

With code similar to that I posed in Asynchronous HTTP Request, I was occasionally getting empty responses to my requests. When I added urllib2.HTTPErrorProcessor to the inheritance list for MyHandler, the problem went away. My guess is the server was generating a 503 Service Unavailable responses and my client code wasn't handling it. How one was supposed to know to do this from the documentation, I am unsure. I'm guessing that if the server might provide a redirect for your url, you would also want to inherit from urllib2.HTTPRedirectHandler.