Tuesday, July 13, 2010

An absolutely relative import

Part of the "What's New" documentation for python 2.5 describes how to make use of absolute imports. After reading this, you might find the following example confusing. I sure was confused after trying it.

Create string.py:

import string
a = 1
Create main.py:
from __future__ import absolute_import
import string
print string.a
Both scripts should be placed in the same directory. Run main.py:
$ python main.py
You'll see main.py print "1", the value set by string.py. A reading of the python documentation might lead you to believe that this behavior is incorrect---it should instead import the standard library string module and raise an AttributeError. This interpretation is correct except for that, by default, python includes the script directory in the list of "absolute" import paths. So, the easy fix is to delete this entry which conveniently is always found at the beginning of sys.path. The revised main.py is:
from __future__ import absolute_import
import sys
sys.path = sys.path[1:]
import string
print string.a

I appreciate that python has moved to a cleaner import system. But, leaving the script/current directory in the list of "absolute" import paths seems like a huge oversight.

What's especially ridiculous about the default behavior is that if you have a module with the same name as a standard library module, import the standard library module, and include unittests at the bottom, the unittests won't work because the import will behave differently depending on whether the module is imported or run as a script. This is the problem that initially brought me down this path...

Update 9/23: After talking with different people about this issue, I've learned that it's easy to think that sys.path.remove('.') is the right thing to do here. It's not. The default local path inserted by python may be a full path or an empty string in which case sys.path.remove('.') won't fix the problem. Trying to remove all local directory entries is also incorrect since the user may genuinely want to include the local directory in the search path.

Wednesday, July 7, 2010

jsonlib

For a project I worked on at ITA, we decided to use pickle for internal object serialization/communication. Pickle certainly makes coding simple, but I've occasionally wondered whether we made the best choice. I found this article comparing deserialization libraries to be interesting. It sounds like the two main competing camps are json and Google's protocol buffers. It sounds like protocol buffers is slow (in python) because it is pure python and not optimized for speed. One json library, jsonlib sounds like the right way to go as it provides faster speeds and more compact storage than pickle.