Monday, February 8, 2010

__missing__

According to the python documentation:

If a subclass of dict defines a method __missing__(), if the key key is not present, the d[key] operation calls that method with the key key as argument. The d[key] operation then returns or raises whatever is returned or raised by the __missing__(key) call if the key is not present. No other operations or methods invoke __missing__(). If __missing__() is not defined, KeyError is raised. __missing__() must be a method; it cannot be an instance variable. For an example, see collections.defaultdict.
This is, at least, incomplete, since __missing__ must not only return the default value, but also assign it internally. This is made clear in the documentation for collections.defaultdict:
If default_factory is not None, it is called without arguments to provide a default value for the given key, this value is inserted in the dictionary for the key, and returned.
Surprisingly, the __missing__ method is not mentioned in the special method names section of the python documentation.

Thursday, February 4, 2010

collections.defaultdict

collections.defaultdict is nice, especially when counting things. But, defaultdict only lets you use zero-argument constructors. Pffft! Fortunately, it's easy to write a defaultdict which passes arguments to the constructor:

class defaultdict2(dict):
    def __init__(self, factory, factArgs=(), dictArgs=()):
        dict.__init__(self, *dictArgs)
        self.factory = factory
        self.factArgs = factArgs
    def __missing__(self, key):
        self[key] = self.factory(*self.factArgs)
        return self[key]

Update 2/8/10: added "return" line to __missing__ per discussion in this post on __missing__.

Wednesday, February 3, 2010

Kid Template Recompilation

I'm involved in a project which uses the TurboGears framework for serving web pages. The templating language we use is Kid. Recently, we ran into a problem where web pages did not correspond to the installed templates. After a bit of detective work, we suspected that TurboGears/Kid was not using the templates, but rather stale, compiled versions of old templates (.pyc files). Some Kid mailing list discussion confirmed our suspicions. The problem is that Kid only recompiles if the mtime of the source (.kid) file is after the mtime of the corresponding compiled (.pyc) file. In contrast, Python recompiles unless the mtime stored in the .pyc file exactly matches the mtime of the source (.py) file.

My understanding is that, ideally, Python would use a one-way hash of the source and only use the compiled file if there is an exact match. The exact mtime comparison is practically nearly as good and much, much faster. But, the mtime inequality comparison is a poor approximation of the ideal and only works when you can guarantee that (1) the system clock is perfect and never changes timezone (e.g. no switch between EDT and EST), and (2) mtimes are always updated to "now" whenever contents or locations are changed (i.e. even "mv" must affect mtime and rsync -a is right out). I don't know of any OS which provides these guarantees. The good news is that there is no disagreement on the existence of the problem; so, this is likely to be fixed in a future version of Kid.