Devs on Acid

When "progress" is backwards

20 Oct 2020 15:58 UTC

Lately I see many developments in the linux FOSS world that sell themselves as progress, but are actually hugely annoying and counter-productive.

Counter-productive to a point where they actually cause major regressions, costs, and as in the case of GTK+3 ruin user experience and the possibility that we'll ever enjoy "The year of the Linux desktop".

Showcase 1: GTK+3

GTK+2 used to be the GUI toolkit for Linux desktop applications. It is highly customizable, reasonably lightweight and programmable from C, which means almost any scripting language can interface to it too.

Rather than improving the existing toolkit code in a backwards-compatible manner, its developers decided to introduce many breaking API changes which require a major porting effort to make an existing codebase compatible with the successor GTK+3, and keeping support for GTK+2 while supporting GTK+3 at the same time typically involves a lot of #ifdef clutter in the source base which not many developers are willing to maintain.

Additionally GTK+3 made away with a lot of user-customizable themeing options, effectively rendering useless most of the existing themes that took considerable developer effort for their creation. Here's a list of issues users are complaining about.

Due to the effort required to port a GTK+2 application to use GTK+3, many finished GUI application projects will never be ported due to lack of manpower, lost interest of the main developer or his untimely demise. An example of such a program is the excellent audio editor sweep which has seen its last release in 2008. With Linux distros removing support for GTK+2, these apps are basically lost in the void of time.

The other option for distros is to keep both the (unmaintained) GTK+2 and GTK+3 in their repositories so GTK+2-only apps can still be used, however that causes the user of these apps to require basically the double amount of disk and RAM space as both toolkits need to live next to each other. Also this will only work as long as there are no breaking changes in the Glib library which both toolkits are built upon.

Even worse, due to the irritation the GTK+3 move caused to developers, many switched to QT4 or QT5, which requires use of C++, so a typical linux distro now has a mix of GTK+2, GTK+3, GTK+4, QT4 and QT5 applications, where each toolkit consumes considerable resources.

Microsoft (TM) knows better and sees backwards compatibility as the holy grail and underlying root cause of its success and market position. Any 25 year old Win32 GUI application from the Win95 era still works without issues on the latest Windows (TM) release. They even still support 16bit MS-DOS apps using some built-in emulator.

From MS' perspective, the freedesktop.org decision makers played into their hands when they decided to make GTK+3 a completely different beast. Of course, we are taught to never believe in malice but in stupidity, so it is unthinkable that there was actually a real conspiracy and monetary compensations behind this move. Otherwise we would be conspiracy theorist nuts, right ?

Showcase 2: python3

Python is a hugely successful programming/scripting language used by probably millions of programmers.

Whereas python2 development has been very stable for many years, python3 changes at the blink of an eye. It's not uncommon to find that after an update of python3 to the next release, existing code no longer works as expected.

Many developers such as myself prefer to use a stable development environment over one that is as volatile as python3.

With the decision to EOL python2 thousands of py2-based applications will experience the same fate as GTK+2 applications without maintainer: they will be rendered obsolete and disappear from the distro repositories. This may happen quicker than one would expect, as python by default provides bindings to the system's OpenSSL library, which has a history of making backwards-incompatible changes. At the very least, once the web agrees on a new TLS standard, python2 will be rendered completely useless.

Porting python2 to python3 isn't usually as involving as GTK+2 to GTK+3, but due to the dynamic nature of python the syntax checker can't catch all code issues automatically so many issues will be experienced at runtime in cornercases, causing the ported application to throw a backtrace and stopping execution, which can have grave consequences.

Many companies have millions of line of code still in python2 and will have to produce quite some sweat and expenses to make it compatible to python3.

Showcase 3: ip vs ifconfig

Once one had learned his handful of ifconfig and route commands to configure a Linux' box network connections, one could comfortably manage this aspect across all distros. Not any longer, someone had the glorious idea to declare ifconfig and friends obsolete and provide a new, more "powerful" tool to do its job: ip.

The command for bringing up a network device is now ip link set dev eth1 up vs the older ifconfig eth1 up. Does this really look like progress? Worst, the documentation of the tool is non-intuitive so one basically has to google for examples that show the translation from one command to the other.

The same critics apply to iw vs iwconfig.

Showcase 4: ethernet adapter renaming by systemd/udev

Latest systemd-based distros come up with network interface names such as enx78e7d1ea46da or vethb817d6a, instead of the traditional eth0. The interface names assigned by default on Ubuntu 20 are so long a regular human can't even remember them, any configuration attempt requires one to copy/paste the name from ip a output. Yet almost every distro goes along with this Poettering/freedesktop.org-dictated nonsense.

Showcase 5: CMake, meson, and $BUILDSYSTEMOFTHEDAY

While the traditional buildsystem used on UNIX, autoconf, has its warts, it was designed in such a way that only the application developer required the full set of tools, whereas the consumer requires only a POSIX compatible shell environment and a make program.

More "modern" build systems like cmake and meson don't give a damn about the dependencies a user has to install, in fact according to this, meson authors claimed it to be one of their goals to force users to have a bleeding edge version of python3 installed so it can be universally assumed as a given.

CMake is written in C++, consists of 70+ MB of extracted sources and requires an impressive amount of time to build from source. Built with debug information, it takes up 434 MB of my harddisk space as of version 3.9.3. It's primary raison-d'etre is its support for the Microsoft (TM) Visual Studio (R) (TM) solution files, so Windows (TM) people can compile stuff from source with a few clicks.

The two of them have in common that they threw over board the well-known user interface to configure and make and invented their own NIH solution, which requires the user to learn yet another way to build his applications.

Both of these build systems seem to have either acquired a cult following just like systemd, or someone is paying trolls to show up on github with pull requests to replace GNU autoconf with either of those, for example 1 2 . Interestingly also, GNOME, which is tightly connected to freedesktop.org, has made it one of its goals to switch all components to meson. Their porting effort involves almost every key component in the Linux desktop stack, including cairo, pango, fontconfig, freetype, and dozens of others. What might be the agenda behind this effort?

Conclusion

We live in an era where in the FOSS world one constantly has to relearn things, switch to new, supposedly "better", but more bloated solutions, and is generally left with the impression that someone is pulling the rug from below one's feet. Many of the key changes in this area have been rammed through by a small set of decision makers, often closely related to Red Hat/Gnome/freedesktop.org. We're buying this "progress" at a high cost, and one can't avoid asking oneself whether there's more to the story than meets the eye. Never forget, Red Hat and Microsoft (TM) are partners and might even have the same shareholders.

Post or read comments...

Speeding up static regexes in C using re2r and ragel

16 Oct 2020 00:16 UTC

While working on tinyproxy I noticed that its config file parser got notoriously slow when processing big config files with several thousand lines (for example Allow/Deny directives).

The config parser uses a set of static POSIX ERE regexes which are compiled once using regcomp(3p) and then executed on every single line via regexec(3p).

For example, the regex for the "Allow" directive is

(((([0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)(/[0-9]+)?)|(((([0-9a-fA-F:]{2,39}))|(([0-9a-fA-F:]{0,29}:([0-9]+[.][0-9]+[.][0-9]+[.][0-9]+))))(/[0-9]+)?))|([-A-Za-z0-9._]+))

which consists of the more readable parts

"(" "(" IPMASK "|" IPV6MASK ")" "|" ALNUM ")"

as defined using some CPP macros in the source code.

So basically the regex matches either an ipv4 address with a netmask like 10.0.0.0/8, an ipv6 with a netmask, or an alphanumeric domain name.

Parsing 32K lines with Allow statements using the libc's regexec function took about 2.5 seconds, which made me wonder whether we could get this a little bit faster.

POSIX regexec() has the following signature:

int regexec(const regex_t *restrict preg, const char *restrict string,
    size_t nmatch, regmatch_t pmatch[restrict], int eflags);

preg is the compiled regex, string the string to match, nmatch the maximum number of matching groups, and pmatch an array of end/start indices into the string, corresponding to matching groups. Matching groups are the parts enclosed inside parens in the regex. This is a very practical feature as it allows to easily extract submatches.

My idea was to write a wrapper around re2c or ragel (both of which compile a fast finite state automaton), which automatically turns a POSIX-compatible ERE expression into the expected format and generates a regexec()-like wrapper function that provides the same convenient submatch array.

For evaluation, I first created a manual re2c conversion of (a predecessor of) the above "Allow" regex, however that resulted in almost 10K (!) lines of C code emitted. Re2c input

Next I tried the same thing with ragel, and to my pleasant surprise the resulting C code was only a little over 900 lines, i.e. 10% of re2c. Ragel input

This made it quite clear that ragel was the winner of the competition.

After spending some more effort, the product was named re2r (regex to ragel) and is available here.

re2r accepts input on stdin, a machine name followed by a space and a regex per line. For example (from tinyproxy):

logfile "([^"]+)"
pidfile "([^"]+)"
port ([0-9]+)
maxclients ([0-9]+)

which generates the following code:

re2r helpfully prints the message:

 diagnostics: maximum number of match groups: 2

more about that in a minute.

As a size optimization, for multiple identical regexes, the wrapper for that machine simply calls the wrapper for the machine with the identical regex, e.g. re2r_match_pidfile() calls re2r_match_logfile().

The prototype for our regexec()-like match functions looks like:

RE2R_EXPORT int re2r_match_logfile(const char *p, const char* pe, size_t nmatch, regmatch_t matches[]);

RE2R_EXPORT needs to be defined by the user to either "static" or "extern", depending on how he needs the visibility of the function. re2r_match_logfile is the function name generated for the named regex "logfile".

p is a pointer to the start of the string to be matched, and pe to the end (usually it can be defined as as p+strlen(p)). nmatches is just like in the POSIX regexec() signature the maximum number of items that can be stored in the matches array, which is optimally of the size that our diagnostic line earlier notified us about (here: 2). The matches array is of type regmatch_t[] (thus we need to include the header regex.h to get the definition) and it must consist of nmatches items.

Now we only need to run ragel on the re2r output to get a heavily optimized matcher function that returns almost identical results to using the same regex/ string with POSIX regcomp()/regexec(), while having an almost identical function signature, so it's straightforward to replace existing code.

As a trick, the plain output of re2r can be directly compiled using gcc -include regex.h -DRE2R_EXPORT=extern -c foo.c after running ragel on it, without having to embed/include it in other source files.

In the case of tinyproxy, parsing the 32K allow statements using the re2r/ragel reduced the runtime from 2.5 seconds to a mere 236 milliseconds.

re2r also ships a testing tool called re2r_test which can be used as follows:

re2r_test -r "((foo)|bar(baz))"

which then waits for test input on stdin. upon entering "foo", we get the following output:

---------- RE2R  ----------
0: foo
1: foo
2: foo
((foo)|bar(baz))
12   2    3   31
12   2         1
---------- POSIX ----------
0: foo
1: foo
2: foo
((foo)|bar(baz))
12   2    3   31
12   2         1

The first block is the output from the re2r matcher function, the other from POSIX regexec(). The 0, 1, 2 positions show the extracted match groups, then the regex is displayed followed by 2 lines that show

1) the offsets of all possible matching groups, and 2) the matching groups that actually matched.

In this case only the matching group 1 (outer parens pair) and 2 (foo) matched.

Note that POSIX always makes a matching group 0 available, which has start and end offsets of the entire string if it was successfully matched.

If we now enter "barbaz", we get:

---------- RE2R  ----------
0: barbaz
1: barbaz
3: baz
((foo)|bar(baz))
12   2    3   31
1         3   31
---------- POSIX ----------
0: barbaz
1: barbaz
3: baz
((foo)|bar(baz))
12   2    3   31
1         3   31

In this case, we don't have a match for matching group 2, but one for 3. Group 1 matches again, as it surrounds the entire expression.

Note that while re2r itself is GPL licensed, the code it emits is public domain.

I hope that re2r will be helpful in the adoption of fast ragel parsers into C projects, and believe that re2r_test can be a generally useful tool to visualize regexes and matching groups on the terminal.

The result of the re2r/ragel work on tinyproxy can be evaluated in the ragel branch.

Post or read comments...

Restoring accidentally deleted files on Linux

02 May 2019 22:27 UTC

Doh. Through a tiny bug in a Makefile auto-generated by my build system rcb2, I accidentally deleted the C source file I've been working on for almost an hour, and which wasn't checked into git yet.

Fortunately, I know the basic steps to restore a file* in a filesystem-agnostic way.

These are:

First of all though, I sent a SIGSTOP signal to firefox, the most volatile process on my desktop, to prevent it from writing any files onto my harddisk while the restoration was in process, potentially overwriting the blocks occupied by the deleted file. I did this via an extension I wrote for my Window Manager openbox, which adds a menu item "Pause and iconify" to the popup menu on the titlebar of all windows. I usually use this to prevent Firefox from consuming CPU and draining my laptop's battery, while I'm traveling. Other than that, there's almost nothing running on a typical sabotage linux box which could interfere via constant disk writes, unlike GNU/Linux systems with systemd and a gazillion of background daemons installed.

Then I opened /dev/mapper/crypt_home, the blockdevice containing my /home filesystem in my favorite hexeditor, went on the ascii tab on the right side, and started a search for a string I knew was only in that new C file, which was <openDOW/vec2f.h> since I used that file in a hackish way via an include statement.

After hitting ENTER in hexedit's search dialog, CPU usage went to 100%, and it slowly crunched its way through the encrypted harddisk's blockdevice mapper. I left my computer to brew a coffee, and came back after about 5 minutes. From the current offset displayed, I figured that the search was currently only 40GB into the blockdevice. Many more GBs to go, since the file could be at the very end of the SSD. After another break of about 10 mins, I was lucky enough and the string was found at offset 0x13c6ffa0ab, at about 84 GB into the blockdevice.

Using pageup/pagedown in hexedit, the beginning and end offsets of the source file were quickly found. They were 0x13C6FF1FFC and 0x13C6FFB472, respectively.

dd if=/dev/mapper/crypt_home of=/dev/shm/dump bs=1 skip=$((0x13C6FF1FFC)) count=$((0x13C6FFB472 - 0x13C6FF1FFC))

did the rest to restore the file onto /dev/shm, the ramdrive.

Since my SSD is usually a lot faster than this, I decided to write a program to speed up future searches. The plan is simple, one needs to read from the filesys in large chunks, so the time spent in syscalls is neglible, and then search over the memory chunks using an optimized algorithm that compares word-at-a-time, just like musl's memmem() function does. Plus some more logic to find the searchterm even across chunk boundaries. The result can be found here in a small C program.

And indeed, it is a lot faster than hexedit.

# time ./fastfind /dev/mapper/crypt_home '<openDOW/vec2f.h>'
curr: 0x13498f0000
bingo: 0x13c6ffa0ab
^CCommand terminated by signal 2
real    1m 4.26s
user    0m 20.35s
sys     0m 19.38s

at 64 seconds total, it crunched through the blockdevice at a rate of 1.2GB/sec, at least 10x faster than hexedit.

So for future undelete tasks, my fastfind utility will become the first stop, to find an offset, which will then be followed by my good old friend hexedit to find beginning and end position in the neighbourhood of that offset, and to be finished off with dd.

*: This approach works well for smaller files, whereas bigger ones are usually spread over several non-adjacent blocks.

Post or read comments...

Mastering and designing C/C++ build systems

19 Apr 2019 10:36 UTC

A Primer for build system developers and users

As the maintainer of sabotage linux, a distro compiled from source, with >1000 packages, and being involved in the development of musl libc, I've seen a wide variety of odd build systems, or regular build systems used in an odd way. Which resulted in lots of issues trying to get other people's packages building.

The vast majority of build system coders and developers using these build systems for their packages do not understand in detail how their toolchains are supposed to be used, and especially cross-compilation is a topic the majority of people knows nothing about. The intent of this blog post is to explain the basic mechanisms, to change this situation.

But first, let's establish the meaning of some terms. From here on, the term user will be used to mean the person trying to compile your software package from source. We're not concerned here about people using the compilation result via a binary package.

Now we will first take a quick look at the basic concepts involved in compilation, followed by the typical 3 stages of a build process, which are: Configuration, Compilation, Installation.

Basic Compilation Concepts

So in order to get your program compiled on a variety of different hosts, you typically need to interface with the following components:

The compiler.

For the C programming language, the convention is that on the user's system there's a C compiler installed in the default search PATH with the name cc. It can be overridden with the environment variable CC.

so if CC is set to clang, the build system should use clang instead of cc.

A sanely designed build system does something along the lines of:

if is_not_set($CC): CC = cc

For C++, the default binary name is c++ and the environment variable CXX.

Note that the user may choose to set CC or CXX to something that includes multiple items, for example CC=powerpc-gcc -I/tmp/powerpc/include.

Therefore, in a shell script, when you want to use the CC command to compile something, the $CC variable needs to be be used unquoted, i.e. $CC and not "$CC" since the latter would force the shell to look for a binary with the spaces inside the filename.

(For the record, the compiler is the program that turns your sourcecode into an object file, e.g. cc foo.c -c -o foo.o)

The linker.

Fortunately with C and C++, unless you do highly unusual things, you will never have to invoke the linker directly. instead you can simply use CC or CXX and they will know from the context that a linker is needed, and call the linker themselves. (For the record, the linker is what takes a couple .o files and turns them into an executable or a shared library, e.g.: cc foo.o bar.o -o mybinary.elf)

Compiler and linker options.

There will be a couple options you will have to use so the compilation works in a certain way. For example, your code may require the flag -std=c99 if you use C99 features.

Additionally, the user will want or need to use certain flags. For this purpose, the environment variable CFLAGS is used.

If the user didn't specify any CFLAGS himself, you may decide to set some sane default optimization flags (the default for GNU autoconf packages is -O2 -g -Wall). The CFLAGS used for the compilation should always put the user-set CFLAGS last in the command line, so the user has the ability to override some defaults he doesn't like. The following logic describes this:

REQUIRED_CFLAGS=-std=c99
CFLAGS_FOR_COMPILE=$(REQUIRED_CFLAGS) $(CFLAGS)

For C++, these flags are called CXXFLAGS, and the logic is precisely the same.

There's also CPPFLAGS, which is used for preprocessor directives such as -DUSE_THIS_FEATURE -DHAVE_OPENGL and include directories for header lookup. More about headers soon. Again, user-supplied CPPFLAGS need to be respected and used after the CPPFLAGS the build system requires.

Last but not least we have LDFLAGS, these are flags used at link time. It contains things such as -L linker library search path directives, -lxxx directives that specify which libraries to link against, and other linker options such as -s (which means "strip the resulting binary"). Here, again, the rule is to respect user-provided LDFLAGS and put them after your own in the linker command.

From here on, whenever we talk about cc or CC or CFLAGS, the exact same applies to c++, CXX and CXXFLAGS for C++.

Libraries and their headers

When writing code in C or C++, you necessarily need to use libraries installed on the end users machine. At least, you would need to use the C or C++ standard library implementation. The former is known as libc, the latter as libstdc++ or libc++. Optionally some other libraries, such as libpng may be needed.

In compiled form, these libraries consist of header files, and the library itself, as either static (.a archive) or dynamic library (.so, .dynlib, .dll). These headers and libs are stored in a location on your user's machine, which is typically /usr/include for headers and /usr/lib for libraries, but this is none of your concern. It's the job of the user to configure his compiler in such a way that when you e.g. #include <stdio.h> it works (usually the user uses his distro-provided toolchain which is properly set up).

Cross-compilation

Cross-compilation means that you compile for a different platform than the one you're using, for example if you want to compile ARM binaries for your raspberry pi from your x86_64 desktop.

It's not really much different than regular compilation, you pass your compiler name as CC, e.g. CC=armv7l-linux-musl-gcc and set your C and CPP flags such that they point into the lib/ and include/ dirs with your other ARM stuff in it. For example, if you prepare a rootfs for your raspberry pi in /tmp/piroot, you'd probably set up your compiler-related environment vars as following:

CC=armv7l-linux-musl-gcc
CPPFLAGS=-isystem /tmp/piroot/include
LDFLAGS=-L/tmp/piroot/lib

In compiler jargon, the armv7l-linux-musl prefix to your toolchain name is the so-called triplet. All components of your toolchain are prefixed with it, for example the ar archiver is called armv7l-linux-musl-ar, the same applies for as, ld, ranlib, strip, objdump, etc.

In Autoconf-based build systems, you pass the triplet as --host=armv7l-linux-musl to ./configure, whereas Makefile-only based systems usually use a CROSS_COMPILE environment variable, which is set to triplet plus a trailing dash, e.g. CROSS_COMPILE=armv7l-linux-musl-. In your own build system, you should follow the GNU autoconf convention though.

What makes cross-compilation a bit tricky is

The Build Process

If you design a build system from scratch, keep in mind that your users probably don't want to spend a lot of time learning about your system. They simply want to get the process done as painlessly and quickly as possible (which implies that the build system itself should have as little external dependencies as possible).

Please do respect existing conventions, and try to model your build system's user interface after the well-established GNU autoconf standards, because it's what's been around for 20+ years and what the majority of packages use, so it's very likely that the user of your package is familiar with its usage. Also, unlike more hip build tools of the day, their user interface is the result of a long evolutionary process. Autoconf does have a lot of ugly sides to it, but from a user perspective it is pretty decent and has a streamlined way to configure the build.

Step1: Configuration

Before we can start building, we need to figure out a few things. If the package has optional functionality, the user needs to be able to specify whether he wants it or not. Some functionality might require additional libraries, etc. This stage in the build process is traditionally done via a script called configure.

Enabling optional functionality

Your package may have some non-essential code or feature, that might pull in a big external library, or may be undesirable for some people for other reasons.

Traditionally, this is achieved by passing a flag such as --disable-libxy or --without-feature, or conversely --with-feature or --enable-libxy.

If such a flag is passed, the script can then write for example a configuration header that has some preprocessor directive to disable the code at compile time. Or such a directive is added to the CPPFLAGS used during the build.

These flags should be documented when the configure script is being run with the --help switch.

System- or Version-specific behaviour

Sometimes one needs to use functionality that differs from system to system, so we need to figure out in which way the user's system provides it.

The wrong way to go about this is to hardcode assumptions about specific platforms (OS/compiler/C standard library/library combinations) with ifdefs like this:

#if OPENSSL_VERSION_NUMBER >= 0x10100000
/* OpenSSL >= 1.1 added DSA_get0_pqg() */
    DSA_get0_pqg(dsa, &p, &q, &g);
#else
    ...
#endif

This is wrong for several reasons:

The proper way to figure out whether DSA_get0_pqg() exists, is... to actually check whether it exists, by compiling a small testcase using it (more below), and pass a preprocessor flag such as HAVE_DSA_GET0_PQG to the code in question.

Even worse than the above hardcoded version number check is when people assume that a certain C library implementation, for example musl, have a certain bug or behaviour or lack a certain function, because at the time they tested it that was the case. If a __MUSL__ macro would exist , they would just hardcode their assumption into the code, even though the very next version of musl might have fixed the bug or added the function in question, which would then result in compile errors or even worse, bogus behaviour at runtime.

Checking for headers

You should NEVER hardcode any absolute paths for headers or libraries into your build system, nor should you start searching in the user's filesystem for them. This would make it impossible to use your package on systems with a non-standard directory layout, or for people that need to crosscompile it (more on cross-compilation just a little further down).

The majority of third-party libraries install their headers either into a separate sub-directory in the compiler's default include path (for example /usr/include/SDL/*.h), or if there's only one or two headers directly into the include dir (for example /usr/include/png.h). Now when you want to test for whether the user's system has the libpng headers installed, you simply create a temporary .c file with the following contents:

#include <png.h>
typedef int foo;

and then use $CC $CPPFLAGS $CFLAGS -c temp.c and check whether the command succeeded. If it did, then the png.h is available through either the compiler's default include directory search paths, or via a user-supplied -I incdir statement which he can provide if his libpng is installed in a non-standard location such as $HOME/include.

Note that this approach is cross-compile safe, because we didn't need to execute any binary.

If you want to use headers of a library such as SDL that installs a number of headers into a subdir, you should reference them in your code via #include <SDL/SDL.h> and not #include <SDL.h>, because the latter will require the addition of -I path include search path directives.

Checking for functions in libraries

After you've established that the user has libpng's headers installed, you might want to check whether it links correctly and whether it provides a certain function you're using (though testing for this only makes sense if the function is a recent addition).

Again, you check this by writing a temporary .c file, that looks roughly like:

#include <png.h>
int main() {png_set_compression_buffer_size(0, 0);}

the command to test it is: $CC $CPPFLAGS $CFLAGS temp.c -lpng $LDFLAGS.

If the command succeeds, it means that one of libpng.a/.so is available in the compiler's default library search path, (or in some -L directive the user added to his LDFLAGS) and that it contains the function png_set_compression_buffer_size. The latter is established by using a main() function, which forces the linker to fail on missing symbols (also note the omission of -c).

If your aim is only to test whether the libpng library is installed, the test can be written as:

#include <png.h>
int main() {return 0;}

and compiled exactly as the previous. Note that this test actually checks that both the header exists AND the library, so by using this kind of test you don't actually need to test for header and library separately. Again, we merely compiled the testcase and didn't need to execute it.

Pkg-config and derivates

For simple libraries such as zlib you should always try first whether you can simply link to e.g. -lz. If that doesn't work, you can fall back to a tool called pkg-config or one of its clones such as pkgconf, which is widely used. The path to the tool is user provided via the environment variable PKG_CONFIG. If not set, the fall-back is to use pkg-config instead. It can be used like this:

$PKG_CONFIG --cflags gtk+-2.0

This will print a couple of -I include directives that are required to find the headers of gtk+2.

Likewise

$PKG_CONFIG --libs gtk+-2.0

can be used to query the LDFLAGS required for linking gtk+2. Note that by default, pkg-config looks into $(prefix)/lib/pkgconfig, which is not compatible with crosscompilation.

2 solutions exist to make pkg-config compatible with cross-compilation:
Now comes the bummer:

The authors of some packages wrote their own package specific pkg-config replacement, reasoning unknown. For example, on my machine the following proprietary -config programs exist: allegro-config, croco-config, curl-config,freetype-config, gpg-error-config, icu-config, libpng-config, pcap-config, pcre-config, python-config, sdl-config, xml2-config ...

What they all have in common is that they do things differently and they are not cross-compile compatible. Usually, whenever one of them is being used by a build system, cross-compilation breakage follows. Because these tools simply return the include and library directories of the host.

Unfortunately, the authors of some of these programs refuse to write portable pkg-config files instead. OTOH, most of them require no special include dirs, and their --libs invocation simply returns -lfoo. For those few that don't (the worst offenders are apr-1-config tools from Apache Foundation), as a build system author, I suppose, the only correct way to deal with them is to not use them at all, but instead force the user to specify the include and library paths for these libraries with some configuration parameters. Example: --apr-1-cflags=-I/include/apr-1

Checking for sizes of things

In some rare cases, one needs to know e.g. the size of long of the toolchain target at compile time. Since we cannot execute any testbinaries that would run e.g.

printf("%zu\n", sizeof(long));

and then parse their output because we need to stay compatible with cross-compilers, the proper way to do it is by using a "static assertion" trick like here:

/* gives compile error if sizeof(long) is not 8 */
int arr[sizeof(long) == 8 ? 1 : -1];

Compile the testcase with $CC $CPPFLAGS $CFLAGS -c temp.c.

Another way is to run e.g.

$CC $CPPFLAGS -dM -E - </dev/null | grep __SIZEOF_LONG__

This command (without the piped grep) makes GCC and derivates spit out a list of built-in macros. Only GCC and Clang based toolchains that came out during the last couple years support this though, so the static assert method should be prefered.

Checking for endianness

Unfortunately, varying platforms have provided endianness test macros in different headers. Because of that, many build system authors resorted to compiling and running a binary that does some bit tricks to determine the endianness and print a result.

However since we cannot run a binary as we want to stay cross-compile compatible , we need to find another way to get the definition. I've actually spent a lot of effort by trying dozens of compiler versions and target architectures and came up with a public domain single-header solution, that has portable fallback functions that can do endian conversions even if the detection failed, although at a slight runtime cost.

I would advise its usage, rather than trying to hack together a custom thing.

Checking for bugs and similar things

I've also come across a number of checks that required to run a testcase and therefore prevented crosscompilation from working. Mostly, these are tests for a certain bug or odd behaviour. However, it is wrong to assume that because the system the test binary currently runs on has a certain bug, the end user's system will have the same bug. The binary might for example be distributed as a package, and might suddenly start misbehaving if another component that fixes the bug is updated. Therefore the only safe and correct way to deal with this situation is to write a check that's executed when the binary is used at runtime, and then sets a flag like bug=1; and then have two different codepaths, one for a system with the bug and one for a system without it.

Cross-compile specific configuration

In GNU Autoconf, the way to tell it that you're cross-compiling is by setting a --host=triplet parameter with the triplet of the target toolchain, additional to putting the crosscompiler name into the CC environment variable. The triplet is then used to prefix all parts of the toolchain, like

RANLIB=$(triplet)-ranlib
STRIP=$(triplet)-strip

etc. For the build host, there's also a parameter called --build=triplet . If not set, the configure process will try whether gcc or cc is available, and then use that. If set, all toolchain components targeting the host you're on will be prefixed with this triplet. It can be queried by running $CC -dumpmachine. Usually, it is not necessary to set it.

Checking for the target OS

As mentioned it's hugely preferable to test for functionality rather than platform. But if you really think it's necessary to figure out the target OS, do not use uname which is totally bogus. It simply returns the OS of the compiler user, who might use an Apple computer but cross-compile for NetBSD.

You can instead derive the target OS via $CC -dumpmachine, which returns the toolchain target triplet, or by parsing the output of

$CC $CPPFLAGS -dM -E - </dev/null

Configuring paths

Knowledge about system paths is required for 2 reasons. One is that during the Installation stage we need to know where files like the compiled program binary need to be installed in. The other is that our program or library might require some external data files. For example, the program might require a database at runtime.

For this reason, a --prefix variable is passed to the configure step. On most typical linux installations --prefix=/usr would be used for a system install, whereas --prefix=/usr/local is typically used for an alternate installation from source of a package the distribution provides but for some reason is not sufficient for the user. Sabotage Linux and others use an empty prefix, i.e. --prefix=, which means that for example binaries go straight into /bin and not /usr/bin, etc. Many hand-written configure scripts get this wrong and treat --prefix= as if the user hadn't passed --prefix at all, and fall back to the default. The default, btw is traditionally /usr/local.

So in case your program needs a database, let's say leetpackage.sqlite, you would probably hardcode the following db path into your binary:

#define DB_PATH PREFIX "/share/leetpackage/leetpackage.sqlite"

where PREFIX would be set as part of CPPFLAGS or similar according to the user's selection. For more fine-grained control, traditional configure scripts also add options like --bindir, --libdir, --includedir, --mandir, --sysconfigdir, etc additional to --prefix, which, if not set, default to ${prefix}/bin, ${prefix}/lib, ${prefix}/include etc.

More on paths in the Installation chapter.

Step 2: The build

After the configuration step finished, it should have written the configuration data in some form, either a header, or a Makefile include file, which is then included by the actual Makefile (or equivalent). This should include any previously mentioned environment variables, so it is possible to login in a different shell session without any of them set, yet getting the same result when running make. Some users of GNU autotools create the Makefile from a template (usually called Makefile.in) at the end of the configure run, but I personally found this to be really impractical, because when making changes to the Makefile template, configure has to be re-run every single time. Therefore I recommend writing the settings into a file called config.mak, which is included by the Makefile.

The actual compilation is typically run by executing make, which on most systems defaults to GNU make, which is a lot more powerful than the traditional BSD makes. Its code is small and written in portable C, so it's easy to get it bootstrapped quickly on systems that don't have it yet, unlike competitors such as CMake, which is 1) written in C++ which takes a lot longer to parse than C, and 2) consists of > 1 million lines of code and 3) occupies a considerable amount of HDD space once installed. Anyway, GNU make can even be found pre-installed on the BSDs, it's called gmake there.

Here, the following conventions apply:

If a Makefile is used for building, the build process should be tested using several parallel processes, because failure to document dependencies of files properly often results in broken parallel builds, even though they seem to work perfectly with -j1.

Do note that you should not strip binaries, ever. If the user wants his binaries stripped, he will pass -s as part of his LDFLAGS.

Step 3: Installation

The Installation is typically done using the make install command. Additionally there's an important variable that distro maintainers use for packaging: DESTDIR.

If for example, at configure time, --prefix=/usr was set, then make install DESTDIR=/tmp/foo should cause stuff to be installed into /tmp/foo/usr, so if your package compiles a binary called myprog, it should end up in /tmp/foo/usr/bin/myprog. A typical install rule would look like this:

bindir ?= $(prefix)/bin

...

install: myprog
    install -Dm 755 myprog $(DESTDIR)$(bindir)/myprog

here we use the install program to install the binary myprog to its destination with mode 755 (-m 755) and create all path compontens along the way (-D). Unfortunately, the install program shipped with some BSDs and Mac OS X refuse to implement these practical options, therefore this portable replacement implementation can be used instead.

It is a good idea and the common practice to explicitly set the permissions during the install step, because the user doing the installation might unwittingly have some restrictive umask set, which can lead to odd issues later on.

Even if the build system you intend to write does not use Makefiles, you should respect the existing conventions (unlike CMake & co which NIH'd everything) like V=1, -j8, DESTDIR, --prefix, etc.

Closing thoughts

One of the big advantages of GNU's autotools system is that, from a user's perspective, they require nothing more than a POSIX-compatible shell to execute configure scripts, and GNU make, which as already mentioned is really slim, written in portable C, and widely available while requiring less than one MB of HDD space (my GNU make 3.82 install takes 750KB total including docs).

So in my opinion, the build system of the future, in whatever language it's written in, and how many millions of lines of code it consists of, should do precisely the same: it should at least have the option to generate a configure script and a stand-alone GNU Makefile, which is shipped in release tarballs. That way only the developers of the package need the build toolkit and its dependencies installed on their machine, while the user can use the tools he already has installed, and can interface with the build system in a way he's already familiar with.

Update

19 Apr 2019 19:34 UTC - Added paragraph "Checking for the target OS"

Post or read comments...

benchmarking python bytecode vs interpreter speed and bazaar vs git

07 Apr 2019 00:39 UTC

A couple weeks ago, after an upgrade of libffi, we experienced odd build errors of python only on systems where python had previously been installed with an older libffi version:

error: [Errno 2] No such file or directory: '/lib/libffi-3.0.13/include/ffi.h'

There was no reference to libffi-3.0.13 anywhere in the python source, and it turned out that it was contained in old python .pyc/.pyo bytecode files that survived a rebuild due to a packaging bug, and apparently were queried as authorative during the python build.

/lib/python2.7/_sysconfigdata.pyc:/lib/libffi-3.0.13/include
/lib/python2.7/_sysconfigdata.pyo:/lib/libffi-3.0.13/include

The packaging bug was that we didn't pre-generate .pyc/.pyo files just after the build of python, so they would become part of the package directory in /opt/python, but instead they were created on first access directly in /lib/python2.7, resulting in the following layout:

~ $ la /lib/python2.7/ | grep sysconfigdata
lrwxrwxrwx    1 root     root            48 Mar  4 03:11 _sysconfigdata.py -> ../../opt/python/lib/python2.7/_sysconfigdata.py
-rw-r--r--    1 root     root         19250 Mar  4 03:20 _sysconfigdata.pyc
-rw-r--r--    1 root     root         19214 Jun 30  2018 _sysconfigdata.pyo

So on a rebuild of python, only the symlinks pointing to /opt/python were removed, while the generated-on-first-use .pyc/.pyo files survived.

Annoyed by this occurence I started researching how generation of these bytecode file could be suppressed, and it turned out that it can be controlled using a sys.dont_write_bytecode variable, which in turn is set from the python C code. Here's a patch doing that.

However, before turning off a feature that can potentially be a huge performance boost, a responsible distro maintainer needs to do a proper benchmarking study so he can make an educated decision.

So I developed a benchmark, that runs a couple of tasks using the bazaar VCS system, which is written in python and uses a large amount of small files, so the startup overhead should be significant. The task is executed 50 times, so small differences in the host's CPU load due to other tasks should be evened out.

The task is to generate a new bazaar repo, check 2 files and a directory into bazaar in 3 commits, and print a log at the end.

With bytecode generation disabled, the benchmark produced the following results:

real    3m 15.75s
user    2m 15.40s
sys     0m 4.12s

With pregenerated bytecode, the following results were measured:

real    1m 24.25s
user    0m 20.26s
sys     0m 2.55s

We can see, that in the case of a fairly big application like bazaar with hundreds of python files, the precompilation does indeed make a quite noticable difference. It is more than twice as fast.

What's also becoming apparent is that bazaar is slow as hell. For the lulz, I replaced the bzr command in the above benchmark with git and exported PAGER=cat so git log wouldn't interrupt the benchmark. As expected, git is orders of magnitude faster:

real    0m 0.48s
user    0m 0.02s
sys     0m 0.05s

Out of curiosity, I fiddled some more with python and added a patch that builds python so its optimization switch -O is always active, and rebuilt both python and bazaar to produce only .pyo files instead of .pyc. Here are the results:

real    1m 23.88s
user    0m 20.18s
sys     0m 2.54s

We can see that the optimization flag is next to useless. The difference is so small it's almost not measurable.

Now this benchmark was tailored to measure startup compilation cost for a big project, what about a mostly CPU-bound task using only a few python modules?

I modified a password bruteforcer to exit after a couple thousand rounds for this purpose, and ran it 30x each without bytecode, with .pyc and .pyo each.

Here are the results:

No bytecode:

real    3m 50.42s
user    3m 50.25s
sys     0m 0.03s

.pyc bytecode:

real    3m 48.68s
user    3m 48.60s
sys     0m 0.01s

.pyo bytecode:

real    3m 49.14s
user    3m 49.06s
sys     0m 0.01s

As expected, there's almost no difference between the 3. Funnily enough, the optimized bytecode is even slower than the non-optimized bytecode in this case.

From my reading of this stackoverflow question it appears to me as if the .pyo bytecode differs from regular bytecode only in that it lacks instructions for the omitted assert() calls, and eventually debug facilities.

Which brings us back to the original problem: In order to have the .pyc files contained in the package directory, they need to be generated manually during the build, because apparently they're not installed as part of make install. This can be achieved by calling

./python -E Lib/compileall.py "$dest"/lib/python2.7

after make install finished. With that achieved, i compared the size of the previous /opt/python directory without .pyc files with the new one.

It's 22.2 MB vs 31.1MB, so the .pyc files add roughly 9MB and make the package almost 50% bigger.

Now it happens that some python packages, build scripts and the like call python with the optimization flag -O. this causes our previous problem to re-appear, now we will have stray .pyo files in /lib/python2.7.

So we need to pregenerate not only .pyc, but also .pyo for all python modules. This will add another 9MB to the python package directory.

OR... we could simply turn off the ability to activate the optimised mode, which as we saw, is 99.99% useless. This seems to be the most reasonable thing to do, and therefore this is precisely what I now implemented in sabotage linux.

Post or read comments...

the rusty browser trap

06 Apr 2019 11:55 UTC

If you're following sabotage linux development, you may have noticed that we're stuck on Firefox 52esr, which was released over a year ago. This is because non-optional parts of Firefox were rewritten in the "Rust" programming language, and all newer versions now require to have a Rust compiler installed.

And that is a real problem.

The Rust compiler is written in Rust itself, exposing the typical hen-and-egg problem. Its developers have used previous releases in binary form along the path of evolution of the language and its compiler. This means in practice that one can only build a rust compiler by using a binary build supplied by a third party, which in turn basically means that one has to trust this third party. Assuming that the binary actually works on one's own system.

As sabotage linux is based on musl, the latter is not self-evident.

Traditionally, the only binary thing required to bootstrap sabotage linux was a C compiler. It was used to build the stage0 C compiler, which was then used to build the entire system. A sabotage user can have high confidence that his OS does not contain any backdoors in the userland stack. Of course, it's impossible to read all the millions of lines of code of the linux kernel, nor is it possible to know the backdoors inside the CPU silicon or in the software stack that runs on the BIOS level or below. Still, it is a pretty good feeling to have at least a trustworthy userland.

So Rust developers want you to slap a binary containing megabytes of machine instructions on your PC and execute it.

If we assume for one moment that we are OK with that, the next problem is that we now need a different binary for every architecture we support. There's no mechanism in sabotage that allows to download a different thing per-architecture. All existing packages are recipes on how to build a piece of software from source, and that's done with the identical sources for all platforms.

Additionally, Rust doesn't actually support all architectures we support. It's a hipster thing, and not a professional product. And the hipsters decided to support only a very small number of popular architectures, such as AMD64 and x86. Others are either not supported at all, or without guarantee that it'll work.

So even if we embrace Rust, there will be some architectures that can't have a working Firefox - ever?

Now somebody who probably likes Rust, decided he wants to write a compiler for it in C++, so people can use it to bootstrap from source. However, he targets a pretty old version of it, so in order to get a version compiled that's recent enough to build Firefox's sources, one needs to build a chain of 12+ Rust versions. A member of our team actually embarked on this voyage, but the result was pretty disillusioning.

After our team member spent about 3 nights on this endeavour, he gave up, even though we had support from somebody of "adelie linux", who went throught the entire process already. unfortunately, that person didn't take any step-by-step notes, there's only a repository of mostly unsorted patches and other files and a patched version of rust 1.19.0 to start with. (Here's a blog post from adelie linux authors about rust, btw).

So could it be done? Most likely yes, but it would require me to spend about 2 estimated weeks of work, digging in the C++ turd of LLVM and Rust. Certainly not anything I would like to spend my time on. Unlike the people from adelie linux, my goal is not to create a single set of bootstrap binaries to be used in the future, but package recipes, so a user can build the entire set of rust versions from source. Building them all will probably require almost two full days of CPU time on a very fast box, so this is something not everybody can even afford to do.

So from my point of view, it looks pretty much as if Firefox is dead. By choosing to make it exclusive to owners of a Rust compiler, mozilla chose to make it hard-to-impossible for hobbyists and source code enthusiasts like myself to compile their browser themselves.

Not that it was easy in the past either, every version bump required about a half day of effort to fix new issues, introduced in this giant pile of C++ copy-pasted from dozens of differents projects, and held together by a fragile build system mix of python, shell, perl, ancient autoconf etc etc...

None of those upstream sources were ever tested on musl-based linux systems by their developers, and sabotage's unconventional filesystem layout adds yet another layer of possible breakage especially regarding the python virtualenv based build system.

So, Firefox is dead. What's the alternative?

Chromium? Possibly, but it's a clusterfuck itself. The source tarball is about 0.5 GB compressed. and requires 2+GB hdd space just to unpack the sources, and probably another 5 GB for temporary object files during the build. And it will takes hours and hours to build, if you even have enough RAM. That's not really compatible with a hobbyist project, besides the numerous privacy issues with this browser.

The only viable option left might be a webkit based browser or palemoon, a fork of firefox without rust.

I even considered for a while to run a QEMU VM with ReactOS with a binary windows-based precompiled browser, but funnily enough, around the same time mozilla started giving the boot to open-source enthusiasts by requiring Rust, they also removed support for Windows XP. And subsequently for ReactOS, since it is based on the Win2K3 API.

So the future looks pretty grim. We need to invest a lot of work trying to get Palemoon to compile, and hopefully it will stay rust-free and usable for a couple more years. If not, we will be forced to run a VM with a bloated GLIBC-based linux distro and the full X11 stack, just to run a browser.

Because unfortunately, without an up-to-date browser, a desktop system is almost worthless.

Post or read comments...

how compatible is libreSSL ?

12 Jul 2014

portability

yesterday the "portable" version of libressl was released. http://ftp.openbsd.org/pub/OpenBSD/LibreSSL/libressl-2.0.0.tar.gz

i set up a package in sabotage linux, and went on a voyage to investigate whether the full set of packages can be used with libressl instead of openssl.

first of all, i had to fight some obstacles to get libressl compiling though...

obstacle 1 - -Werror
../include/openssl/bio.h:622:3: error: '__bounded__' attribute directive ignored [-Werror=attributes]

-Werror is hardcoded in the configure script, which is a very bad idea, and the opposite of portable. using -Werror is a guarantueed build break whenever the build is tried on a system the original developer had no access to. it's sufficient to use a different compiler version, different libc version, etc to make new warnings pop up.

fixed with

sed -i 's/-Werror//' configure
obstacle 2 - unconditional inclusion of internal glibc header
compat/issetugid_linux.c:7:30: fatal error: gnu/libc-version.h: No such file or directory

many people assume linux == glibc, but that is not the reality. sabotage linux uses musl libc, and there are at least 4 other libcs that could be used instead (uclibc, dietlibc, klibc, bionic).

looking at issetugidlinux.c uncovers a dubious hack: if glibc 2.19 is detected, getauxval(ATSECURE) is not used, because there was once a bug (see comment in source code).

however it's common practice in distros to backport bugfixes, without updating the version number. so this hack prevents proper usage of getauxval even if your libc version is long fixed. the mentioned bug is very likely already fixed in any distro using glibc 2.19.

to get the thing out of my way and compilation going on, the quick fix was to cover everything with #ifdef __GLIBC__. what the code really should do though is to just use the getauxval call unconditionally without the glibc version check.

obstacle 3 - unnecessary unconditional inclusion of sys/sysctl.h
compat/getentropy_linux.c:27:24: fatal error: sys/sysctl.h: No such file or directory

musl does not have sys/sysctl.h, because: (citing musl's author Rich Felker)

sysctl does not work, and NEVER worked. using it is bogus. it was a bogus experimental syscall that was deprecated before it was ever used (basically, a broken binary version of /proc/sys, without any stability between kernel versions for what the binary constants meant).

since the code in question does not use the sysctl function (declared in sys/sysctl.h) and does the syscall() directly, it was safe and sufficient to just remove the include statement.

still it leaves a bad taste in my mouth that it was used at all...

having fixed these 3 issues, libressl built successfully. commit 4f2da253

on the plus side: using 8 cores, libressl builds in about 1 minute, while openssl requires 1:45. also openssl depends on perl, which takes an additional 2 minutes buildtime. so if nothing else depends on perl, it's about 3x faster.

compatibility

with libressl in place, a "world" metapackage (contains almost all packages) build was started. the results:

wget failed to build due to lack of RAND_egd() function. fixed by using a patch from openbsd. commit 234185c0

stunnel failed to build due to lack of RAND_egd() function. fixed by using a custom patch conceptually equivalent to the wget one. commit 9b47cbb

cryptsetup and others failed to detect openssl due to lack of pkgconfig files. i modified my package build script to create these .pc files (copies from openssl). commit 156a362

php, xorg-server and others failed to build subtly due to an ugly hack used in libressl's libcompat.a, linked into libcrypto.so:

$ gcc test.c -lcrypto -fvisibility=hidden
/bin/ld: a.out: hidden symbol `main' in /tmp/ccobhDjc.o is referenced by DSO
/bin/ld: final link failed: Bad value

$ readelf -a /lib/libcrypto.so | grep main
000000345708 000a00000006 R_X86_64_GLOB_DAT 0000000000000000 main + 0
10: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND main
2146: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND main

in getentropy_linux.c:

extern int main(int, char *argv[]);
#define HD(x) (SHA512_Update(&ctx, (char *)&(x), sizeof (x)))
HD(main); /* an addr in program */

the address of main() is used to gather entropy… very smart… NOT.

most of the methods used in this file to gather entropy are very dubious. the crypto experts from OpenBSD should know better and just use /dev/urandom and/or getauxval(AT_RANDOM) instead of all these hacks.

commit 1a81136

with that fixed, most programs expecting openssl seem to compile and work correctly.

mysql's CMAKE build system fails to detect SSL libraries.

-- OPENSSL_INCLUDE_DIR = /usr/include
-- OPENSSL_LIBRARY = /usr/lib/libssl.so
-- CRYPTO_LIBRARY = /usr/lib/libcrypto.so
-- OPENSSL_MAJOR_VERSION = 2
-- Looking for SHA512_DIGEST_LENGTH
-- Looking for SHA512_DIGEST_LENGTH - found
CMake Error at cmake/ssl.cmake:231 (MESSAGE):
Cannot find appropriate system libraries for SSL. Use WITH_SSL=bundled to
enable SSL support

not patched yet.

the last build error was in apache:

ssl_engine_init.c:445:28: error: `ENGINE_CTRL_CHIL_SET_FORKCHECK' undeclared

this is a macro which is available in openssl's engine.h, and was removed from libressl for unknown reasons. not patched yet.

apart from these two, everything seems to be usable without big effort. so if the libressl developers rip out all their dubious entropy generation methods in favor of /dev/urandom on linux it might be well worth switching to it.

the whole adventure is documented in the libresslreplacesopenssl branch.

Update 07/13

OpenBSD released an updated version 2.0.1 earlier today. the new release fixes the following problems - reference to main() which breaks packages using -fvisibility=hidden - usage of -Werror - generation of pkg-config files - unconditional inclusion of sys/sysctl.h

so the portability concerns have largely been addressed. the only portability issue not fixed is the glibc-specific stuff in issetugid_linux.c. instead, a patch containing an issetugid implementation for inclusion in musl was sent to the musl maillist.

on the application compatibility side nothing seems to have changed. RAND_egd() is still missing, as well as the macros used by apache.

the dubious fallbacks for getentropy (obsolete sysctl syscall, function addresses) are still present.

this blog about similar testing done on gentoo (recommended read) has a link to a patch for the apache build. there is also a patch for a segfault in openssh.

this blog post originally appeared on my currently defunct wordpress blog

Post or read comments...

sqlite's anal gamation

25 Sep 2013

sqlite's slogan: "Small. Fast. Reliable. Choose any three."

i always wondered though, how such a small or "lite" package can take such a considerable amount of time to build.

as the main author of the sabotage linux distribution, building software is my daily bread, so i own a pretty fast build box. it's an 8 core machine with 3.1 GHz, which builds a complete 3.11 linux kernel in less than 5 minutes, making use of all 8 cores via the nice parallel build feature of GNU make.

make -j8

when invoking make like this, it first determines the dependencies between the translation units, and then runs up to 8 build processes, one per cpu core, each one building a different .c file.

GCC 3.4.6, a C compiler with full C99 support builds in 43 sec:

$ time butch rebuild gcc3
2013.09.25 12:13:50 building gcc3 (/src/build/build_gcc3.sh) -> /src/logs/build_gcc3.log
2013.09.25 12:14:33 done.
real 0m 43.97s
user 1m 36.66s
sys 0m 13.74s

however, for sqlite, a supposedly small package, build times are comparatively huge:

$ time butch rebuild sqlite
2013.09.25 12:18:27 building sqlite (/src/build/build_sqlite.sh) -> /src/logs/build_sqlite.log
2013.09.25 12:19:21 done.
real 0m 54.03s
user 0m 52.02s
sys 0m 1.51s

nearly one minute, a fifth of the time used to build the linux kernel and 10 seconds more than the gcc compiler.

the full-blown postgresql database server package, takes less time to build as well:

$ time butch rebuild postgresql
2013.09.25 12:19:21 building postgresql (/src/build/build_postgresql.sh) -> /src/logs/build_postgresql.log
2013.09.25 12:19:57 done.
real 0m 36.63s
user 1m 53.34s
sys 0m 12.03s

how is it possible that postgresql, shipping 16 MB of compressed sources, as opposed to 1.8MB of sqlite, builds 33% faster ?

if you look at the user times above, you start getting an idea. the user time (i.e. the entire cpu time burnt in userspace) for postgresql is 1m53, while the total time that actually passed was only 36s.

that means that the total work of 113 seconds was distributed among multiple cpu cores. dividing the user time through the real time gives us a concurrency factor of 3.13. not perfect, given that make was invoked with -j8, but much better than sqlite, which apparently only used a single core.

let's take a look at sqlite's builddir

$ find . -name '*.c'
./sqlite3.c
./shell.c
./tea/generic/tclsqlite3.c
./tea/win/nmakehlp.c

ah, funny. there are only 4 C files total. that partially explains why 8 cores didn't help. the 2 files in tea/ are not even used, which leaves us with

$ ls -la *.c
-rw-r--r-- 1 root root 91925 Jan 16 2012 shell.c
-rw-r--r-- 1 root root 4711082 Jan 16 2012 sqlite3.c

so in the top level builddir, there are just 2 C files, one being 90 KB, and the other roughly 5MB. the 90KB version is built in less than 1 second, so after that the entire time spent is waiting for the single cpu core building the huge sqlite3.c.

so why on earth would somebody stuff all source code into a single translation unit and thereby defeat makefile parallellism ?

after all, the IT industry's mantra of the last 10 years was "parallellism, parallellism, and even more parallellism".

here's the explanation: https://www.sqlite.org/amalgamation.html

it's a "feature", which they call amalgamation.

i call it anal gamation.

In addition to making SQLite easier to incorporate into other projects, the amalgamation also makes it run faster. Many compilers are able to do additional optimizations on code when it is contained with in a single translation unit such as it is in the amalgamation.

so they have 2 reasons for wasting our time:

let's look at reason 1: what they mean with incorporation is embedding the sqlite source code into another projects source tree.

it is usually considered bad practice to embed third-party source code into your own source tree, for multiple reasons:

instead, the installed version of libraries should be used.

pkg-config can be used to query existence, as well as CFLAGS and LDFLAGS needed to build against the installed version of the library. if the required library is not installed or too old, just throw an error at configure time and tell the user to install it via apt-get or whatever.

conclusion: "incorporation" of source code is a bad idea to begin with.

now let's look at reason 2 (better optimized code): it possibly sometimes made sense to help the compiler do its job in the 70ies, when everything started. however, it's 2013 now. compilers do a great job optimizing, and they get better at it every day.

since GCC 4.5 was released in 2010, it ships with a feature called LTO it builds object files together with metadata that allows it to strip off unneeded functions and variables, inline functions that are only called once or twice, etc at link time - pretty much anything the sqlite devs want to achieve, and probably even more than that.

conclusion: pseudo-optimizing C code by stuffing everything into a big file is obsolete since LTO is widely available.

LTO does a better job anyway - not that it matters much, as sqlite spends most time waiting for I/O. every user who wants to make sqlite run faster, can simply add -flto to his CFLAGS. there's no need to dictate him which optimization he wants to apply. following this logic, they could as well just ship generated assembly code…

but hey - we have the choice ! here's actually a tarball containing the ORIGINAL, UN-ANAL-GAMATED SOURCE CODE...

... just that it's not a tarball.

it's a fscking ZIP file. yes, you heard right. they distribute their source as ZIP files, treating UNIX users as second-class citizens.

additionally they say that you should not use it:

sqlite-src-3080002.zip (5.12 MiB) A ZIP archive of the complete source tree for SQLite version 3.8.0.2 as extracted from the version control system. The Makefile and configure script in this tarball are not supported. Their use is not recommended. The SQLite developers do not use them. You should not use them either. If you want a configure script and an automated build, use either the amalgamation tarball or TEA tarball instead of this one. To build from this tarball, hand-edit one of the template Makefiles in the root directory of the tarball and build using your own customized Makefile.

Note how the text talks about "this tarball" despite it being a ZIP file.

Fun. there's only a single TARball on the entire site, so that's what you naturally pick for your build system. and that one contains the ANAL version. Note that my distro's build system does not even support zip files, as i don't have a single package in my repo that's not building from a tarball. should i change it and write special case code for one single package which doesn't play by the rules ? i really don't think so.

funny fact: they even distribute LINUX BINARY downloads as .zip. i wonder in which world they live in.

why do i care so much about build time ? it's just a minute after all. because the distribution gets built over and over again. and it's not just me building it, but a lot of other people as well - so the cumulated time spent waiting for sqlite to finish building its 5 MB file gets bigger and bigger each day. in the past i built sqlite more than 200 times, so my personal cumulated wasted time on it already exceeds the amount of time i needed to write this blog post.

so what i hope to see is sqlite

Update:

I just upgraded sqlite from 3071000 to 3080002

2013.09.27 02:23:24 building sqlite (/src/build/build_sqlite.sh) -> /src/logs/build_sqlite.log
2013.09.27 02:25:31 done.

it now takes more than 2 minutes.

this blog post originally appeared on my currently defunct wordpress blog

Post or read comments...