In this post we look at using buildah to generate container images that only contain what we want, no extra fluff. We show how this can let us generate truly small images that will load faster and be more secure, and do this without the need for the Docker daemon to be running.
Nearly all container images are Docker images built from a Dockerfile using docker build. In fact you may think this is the only way to do so. This is not the case. There are several ways of building images (see here for a guide). This post focuses on a relatively new tool called buildah that has some key benefits over using the Docker tools:
We’ll explain both of these as we go. First let’s establish some baselines. Many images will use Centos7 and Debain as base images, so let’s look at these standard images that you’ll find on Docker Hub.
You’ll notice that the Centos image is almost twice the size of the Debian image. This is why we generally have used Debian based images. But why is this, and can something be done about this?
Part of the reason must cetainly be because Centos use Yum as the package manager, and Yum is based on Python, so the Centos image comes with a full Python installation, and Python is hardly a lightweight! In contrast the apt package manager in Debian does not need Python.
This hides another problem with both these images. They contain a package manager that is needed because of the way the docker build process works with its Dockerfile. Typically the first thing you do in a Dockerfile is to install the packages you need using the yum or apt package manager, but once this is done the package manager serves no purpose but remains part of the resulting image. This means the final image has extra size and has a larger attack vector for hackers. For instance, if you are wanting an image that runs nginx you really want that image to just contain nginx, not all the things that were needed to put nginx there in the first place.
This is where buildah is different. It installs packages from the outside and the resulting image does not include the package manager and anything else that was needed to build or install your tools. We’ll see this in action shortly.
Let’s get started. We’ll create a clean environment to work in. For Centos based images let’s do this in a Centos Docker image. We’ll fire up the container, update the packages and install buildah:
Now we’re ready to go. At first you might be worried. What about all my existing Dockerfiles? I don’t want to switch to some other process for building my Docker images. Don’t worry! Your Dockerfiles can still work with buildah. As an example let’s use this simple Dockerfile for creating an nginx container:
Now let’s build this with buildah
So we’ve built an image from the Dockerfile. What’s noteable here is that inside this Centos container where we are working there is no Docker daemon running. We’ve built a Docker image without Docker.
So now lets move on to the more interesting aspect of buildah, the ability to pack images from the outside. What do we mean by this? Well, we should be familiar with the concept of using a package manager to install packages in a Docker file. A line like this installs NGinx:
The problem with this as we’ve stated above is that the package manager (and in this case also Python) is part of the image that is built, but neither yum nor Python is needed to run Nginx. With builah you still use a package manager, but it runs on the host machine and installs the packages into the file system that will become the image that is to be built. The package manager is not part of that image. More info on using buildah can be found here. It’s well worth a read and we won’t repeat things here.
So armed with buildah I thought it would be simple to solve the problem with Centos images that was mentioned earlier. I’ll create a base centos image that doesn’t include yum and Python, and we’ll get a nice small image. Well, the first attempt didn’t go well. The story is interesting and my initial attempts did not create very small images. I raised this issue on the project Atomic issue tracker that stimulated a lot of discussion and was summarised in this blog post by Tom Sweeney. Interestingly it seems that some of the ‘fixes’ that were needed do not seem to be needed now (maybe yum has got smarter?), but I keep them in the file as they are good practice anyway.
The build script looks like this:
In this case we’re using buildah in the way it was meant to be used, not from a Dockerfile but as a set commands executed as a bash script. We create a new minimal image, basically just containing the Linux kenrnel from the host machine and then install bash and coreutils using the yum package manager. But the yum that is being used is the one on the host machine, and it’s installing those packages into the filesystem that is mounted as scratchmnt. The coreutils package that is installed contains a set of standard Linux tools. Strickly speaking many of these are not needed, but without them you’d have a hard time debugging your container if you needed to run a shell inside it. If you were obsessive about the number of installed packages then you could install a subset of these.
So what is the result?
A final images size of about 57 MB. Pretty impressive as the equivalent from Docker Hub was 204 MB, and the Debian image was 106 MB.
As an aside, if you also include yum in the packages that are installed the image size increases to 120 MB supporting the idea that much of the extra size of the Centos image from Docker Hub compared to the Debian image is due to yum and Python being present.
So we how have a nice small buildah image, but can we use this with Docker? Yes, but we need to do a little extra work. The images needs to be copied from /var/lib/containers to where Docker expects it. For this we need the Docker daemon, so we move back outside the Docker image where we were working and do this:
So now we have a base Centos image that’s almost a quarter of the size of the one from DockerHub. Nice! You can find it on Docker Hub here.
In the next of this series of posts we’ll see how we can apply this to our RDKit containers and see if we can make them even smaller than we did in the previous post.