General security

Securing containers using Docker isolation

In the previous article, we discussed an overview of common container security misconfigurations such as mounting docker.sock, use of root users accounts and mounting sensitive files and directories from the host. This article provides an overview of a feature called namespaces and how it provides isolation from the host and between the containers.

Learn Container Security

Build your skills around Docker and Kubernetes security, including key technologies, creating and running a secure cluster, and more.

Start Learning

Introduction to namespaces

One of the primary concerns when using containers is isolation between the containers and host as well as the isolation among different containers. Imagine that we spin up two containers with different sets of features and there is no need for each container process to know what's running on the other container. Similarly, let us consider another scenario where there are 3 Apache web servers running in 3 different containers. All three containers will need to start the Apache servers on port 80. In addition to it, the host machine should also be able to use port 80 for another service.

These concerns are addressed in containers using a Linux kernel feature called namespaces. Namespaces are a feature of the Linux kernel that partitions kernel resources such that one set of processes sees one set of resources and another set of processes sees a different set of resources. Thus Docker uses namespaces to provide this isolation to the containers from the host.

The command lsns on a Docker host shows the list of namespaces being used by Docker.

$ lsns

NS TYPE NPROCS PID USER COMMAND

4026531835 cgroup 73 1516 docker /lib/systemd/systemd --user

4026531836 pid 73 1516 docker /lib/systemd/systemd --user

4026531837 user 73 1516 docker /lib/systemd/systemd --user

4026531838 uts 73 1516 docker /lib/systemd/systemd --user

4026531839 ipc 73 1516 docker /lib/systemd/systemd --user

4026531840 mnt 73 1516 docker /lib/systemd/systemd --user

4026531992 net 73 1516 docker /lib/systemd/systemd --user

As we can observe in the preceding excerpt, docker engine uses 6 different namespaces namely:

PID namespace for process isolation.
USER namespace for the user privilege isolation.
UTS namespace for isolating kernel and version identifiers.
IPC namespace for managing access to IPC resources.
MNT namespace for managing filesystem mount points.
NET namespace for managing network interfaces.

Container security using Docker isolation

In this section let us discuss how each of these namespaces will provide isolation from the host to the containers. It should be noted that USER namespace is not enabled by default but the remaining are enabled by default in docker.

PID namespace

The PID namespace provides process isolation. When a container is created, the container process cannot see what processes are running on the host by default. Let us start a new container and check the list of processes running.

$ docker run -it --name container1 alpine sh

/ # ps aux

PID USER TIME COMMAND

1 root 0:00 sh

6 root 0:00 ps aux

/ #

As we can notice in the preceding excerpt, the container only sees the list of processes running in the container, but not on the host.

Now, let us start another container and start a process that runs for a longer period of time. This can be simulated as follows.

$ docker run -it --name container1 alpine sh / # sleep 1000

Any other container started on the host will not be able to see the list of processes running on other containers on the same host. Let us spin up another container and check if the sleep process of container1 is visible from there.

$ docker run -it --name container2 alpine sh

/ # ps aux

PID USER TIME COMMAND

1 root 0:00 sh

6 root 0:00 ps aux

/ #

As we can notice in the preceding excerpt, we cannot see any processes running outside this container. This is achieved using PID namespace.

Now, let us run a third container named container3 with the same PID namespace as that of container1. This can be done as follows.

$ docker run -it --pid=container:container1 --name container3 alpine sh

/ # ps aux

PID USER TIME COMMAND

1 root 0:00 sh

6 root 0:00 sh

12 root 0:00 sleep 1000

13 root 0:00 ps aux

/ #

Notice that we are able to view the processes running on container1 this time as container1 and container3 are in the same PID namespace.

USER namespace

Assume that we have built an application that is running inside a Docker container and the application is running with root privileges on the container. When starting this container, let us also assume that we have mounted the /bin directory of the host machine onto the container. In this scenario, if an attacker compromised this application and gained root access on the container, can this attacker modify files on the host’s /bin directory from within the container?

As mentioned earlier, USER namespaces are not enabled by default and thus the attacker will be able to modify the files owned by the root user on the host. This is because root users inside the container will have the same privileges as the root users on the host unless USER namespaces are enabled. So, if any directory is mounted from the host machine onto the container, the root user on the container will have complete access onto the mounted directory.

Enabling USER namespaces for docker

If we enable user namespaces for Docker daemon, it will ensure that the root inside the docker container is run in a separate context that is different from the host’s context. This will automatically ensure that root on the container is not equal and to root on the host. User namespaces have been available in Docker since version 1.10 of the Linux Docker engine. They allow the Docker daemon to create an isolated namespace that looks and feels like a root namespace. However, the root user inside of this namespace is mapped to a non-privileged UID on the Docker host.

This means that containers can effectively have root privilege inside of the user namespace but not on the Docker host. The following section shows how USER namespaces can be enabled.

Let us stop the Docker engine using the following command.

~$ sudo systemctl stop docker

The Docker engine has been stopped. Now let us start Docker daemon by using the following command.

~$ sudo dockerd --userns-remap=default &

This will start the Docker daemon in the background using the default user namespace mapping where the Docker map user and group are created and mapped to non-privileged UID and GID ranges in the /etc/subuid and /etc/subgid files.

Following are the contents of the /etc/subuid file.

docker@docker:~$ cat /etc/subuid

docker:100000:65536

dockremap:165536:65536

Following are the contents of the /etc/subgid file.

docker@docker:~$ cat /etc/subgid

docker:100000:65536

dockremap:165536:65536

Since we have enabled user namespaces for the Docker daemon, when a root owned file from the host is mounted onto the container, the root user in the container will not have permission to modify this file owned by root on the host. This is because the mounted file exists in the local file system of the Docker host and the container doesn't have root access outside of the namespace that it exists in. Though the container is running under the root user security context, this is only a root user within the scope of the namespace that the container is running in.

Now, let us revisit the file /etc/subuid.

docker@docker:~$ cat /etc/subuid

docker:100000:65536

dockremap:165536:65536

The entry highlighted in /etc/subuid file is used when user namespaces are used by docker. Here dockremap is the name of the system user and 165536 is the system UID to start the UID mapping at. This maps to UID 0 in the container. 65536 is the number of UIDs allowed on top of UID 0 to be mapped. So, 231072 will be the highest UID mapped to the dockremap user. Essentially, dockremap is the user the container will run as when we specify --userns=dockremap when starting the docker engine.

There are reasons why USER namespace is not enabled by default. When Docker daemon runs with user namespaces enabled, sharing PID or NET namespaces with the host (--pid=host or --network=host) is not possible. In addition to it, using the --privileged mode flag on docker run is not possible without specifying --userns=host.

While the root user inside a user-namespaced container process has many of the expected privileges of the superuser within the container, the Linux kernel imposes restrictions based on internal knowledge that this is a user-namespaced process. One notable restriction is the inability to use the mknod command. Permission is denied for device creation within the container when run by the root user.

UTS namespace

UTS stands for UNIX Time-sharing System Sharing and isolates system identifiers Domainname and Hostname. UTS namespace allows all containers to have a unique hostname. Bydefault, all containers will have a hostname with the first 12 characters of the container ID as shown below.

$ docker run -itd --name container5 alpine sh

017b746269f2f73d4b4c9bd042766f206cccc36d032daf75d171a2b9d3c7f997

$ docker exec -it 017b sh

/ # hostname

017b746269f2

/ #

It is also possible to assign a custom hostname to the container using the --hostname flag as follows.

$ docker run -it --hostname=customhost --name container4 alpine sh

/ # hostname

customhost

/ #

IPC namespace

IPC(POSIX/SysV IPC) namespace gives a process its own interprocess communication resources by providing separation of named shared memory segments, semaphores and message queues.

MNT namespace

MNT namespace allows containers to have their own set of mounted file systems and root directories. Processes running in one MNT namespace cannot see the mounted file system of another MNT namespace.

NET namespace

Let us consider the same scenario we discussed in the beginning of the article. Let us assume that there are 3 Apache web servers running in 3 different containers. All three containers will need to start the Apache servers on port 80. In addition to it, the host machine should also be able to use port 80 for another service. Network namespaces (NET namespace) allow processes inside each namespace instance to have access to a new IP address along with the full range of ports.

Learn Container Security

Build your skills around Docker and Kubernetes security, including key technologies, creating and running a secure cluster, and more.

Start Learning

Conclusion

This article has provided an overview of how Docker leverages Linux namespaces to provide isolation to the container processes. PID, UTS, IPC, MNT, and NET namespaces are enabled by default and the USER namespace is not enabled by default. To achieve user privilege separation, we need to enable USER namespace for the docker daemon.

Sources

https://docs.docker.com/engine/security/userns-remap/

https://www.lampstellar.com/node/28

https://docs.docker.com/engine/reference/run/

Posted: March 15, 2021

Srinivas

View Profile

Srinivas is an Information Security professional with 4 years of industry experience in Web, Mobile and Infrastructure Penetration Testing. He is currently a security researcher at Infosec Institute Inc. He holds Offensive Security Certified Professional(OSCP) Certification. He blogs atwww.androidpentesting.com. Email: srini0x00@gmail.com