Devs on Acid - benchmarking python bytecode vs interpreter speed and bazaar vs git

benchmarking python bytecode vs interpreter speed and bazaar vs git

07 Apr 2019 00:39 UTC

A couple weeks ago, after an upgrade of libffi, we experienced odd build errors of python only on systems where python had previously been installed with an older libffi version:

error: [Errno 2] No such file or directory: '/lib/libffi-3.0.13/include/ffi.h'

There was no reference to libffi-3.0.13 anywhere in the python source, and it turned out that it was contained in old python .pyc/.pyo bytecode files that survived a rebuild due to a packaging bug, and apparently were queried as authorative during the python build.

/lib/python2.7/_sysconfigdata.pyc:/lib/libffi-3.0.13/include
/lib/python2.7/_sysconfigdata.pyo:/lib/libffi-3.0.13/include

The packaging bug was that we didn't pre-generate .pyc/.pyo files just after the build of python, so they would become part of the package directory in /opt/python, but instead they were created on first access directly in /lib/python2.7, resulting in the following layout:

$ la /lib/python2.7/ | grep sysconfigdata
lrwxrwxrwx    1 root     root            48 Mar  4 03:11 _sysconfigdata.py -> ../../opt/python/lib/python2.7/_sysconfigdata.py
-rw-r--r--    1 root     root         19250 Mar  4 03:20 _sysconfigdata.pyc
-rw-r--r--    1 root     root         19214 Jun 30  2018 _sysconfigdata.pyo

So on a rebuild of python, only the symlinks pointing to /opt/python were removed, while the generated-on-first-use .pyc/.pyo files survived.

Annoyed by this occurence I started researching how generation of these bytecode file could be suppressed, and it turned out that it can be controlled using a sys.dont_write_bytecode variable, which in turn is set from the python C code. Here's a patch doing that.

However, before turning off a feature that can potentially be a huge performance boost, a responsible distro maintainer needs to do a proper benchmarking study so he can make an educated decision.

So I developed a benchmark, that runs a couple of tasks using the bazaar VCS system, which is written in python and uses a large amount of small files, so the startup overhead should be significant. The task is executed 50 times, so small differences in the host's CPU load due to other tasks should be evened out.

The task is to generate a new bazaar repo, check 2 files and a directory into bazaar in 3 commits, and print a log at the end.

With bytecode generation disabled, the benchmark produced the following results:

real    3m 15.75s
user    2m 15.40s
sys     0m 4.12s

With pregenerated bytecode, the following results were measured:

real    1m 24.25s
user    0m 20.26s
sys     0m 2.55s

We can see, that in the case of a fairly big application like bazaar with hundreds of python files, the precompilation does indeed make a quite noticable difference. It is more than twice as fast.

What's also becoming apparent is that bazaar is slow as hell. For the lulz, I replaced the bzr command in the above benchmark with git and exported PAGER=cat so git log wouldn't interrupt the benchmark. As expected, git is orders of magnitude faster:

real    0m 0.48s
user    0m 0.02s
sys     0m 0.05s

Out of curiosity, I fiddled some more with python and added a patch that builds python so its optimization switch -O is always active, and rebuilt both python and bazaar to produce only .pyo files instead of .pyc. Here are the results:

real    1m 23.88s
user    0m 20.18s
sys     0m 2.54s

We can see that the optimization flag is next to useless. The difference is so small it's almost not measurable.

Now this benchmark was tailored to measure startup compilation cost for a big project, what about a mostly CPU-bound task using only a few python modules?

I modified a password bruteforcer to exit after a couple thousand rounds for this purpose, and ran it 30x each without bytecode, with .pyc and .pyo each.

Here are the results:

No bytecode:

real    3m 50.42s
user    3m 50.25s
sys     0m 0.03s

.pyc bytecode:

real    3m 48.68s
user    3m 48.60s
sys     0m 0.01s

.pyo bytecode:

real    3m 49.14s
user    3m 49.06s
sys     0m 0.01s

As expected, there's almost no difference between the 3. Funnily enough, the optimized bytecode is even slower than the non-optimized bytecode in this case.

From my reading of this stackoverflow question it appears to me as if the .pyo bytecode differs from regular bytecode only in that it lacks instructions for the omitted assert() calls, and eventually debug facilities.

Which brings us back to the original problem: In order to have the .pyc files contained in the package directory, they need to be generated manually during the build, because apparently they're not installed as part of make install. This can be achieved by calling

./python -E Lib/compileall.py "$dest"/lib/python2.7

after make install finished. With that achieved, i compared the size of the previous /opt/python directory without .pyc files with the new one.

It's 22.2 MB vs 31.1MB, so the .pyc files add roughly 9MB and make the package almost 50% bigger.

Now it happens that some python packages, build scripts and the like call python with the optimization flag -O. this causes our previous problem to re-appear, now we will have stray .pyo files in /lib/python2.7.

So we need to pregenerate not only .pyc, but also .pyo for all python modules. This will add another 9MB to the python package directory.

OR... we could simply turn off the ability to activate the optimised mode, which as we saw, is 99.99% useless. This seems to be the most reasonable thing to do, and therefore this is precisely what I now implemented in sabotage linux.