Micro Focus is now part of OpenText. Learn more >

You are here

You are here

How symlinks pwned Kubernetes (and how we fixed it)

public://pictures/jan_jan_safranek_.jpg
Jan Šafránek Principal Software Engineer, Red Hat
public://pictures/michelle_au.jpg
Michelle Au Software Engineer, Google
 

Have you ever wondered how the Kubernetes community deals with security vulnerabilities? Here's how we discovered a vulnerability (CVE-2017-1002101, disclosed publicly earlier this year), patched it, and finally revealed it to the public.

The process began in November 2017. A security practitioner posted to GitHub about the potential misuse of Kubernetes' volume subpath feature and how it could result in open access to files on a host system.

Needless to say, that wasn't the appropriate way to report a vulnerability in Kubernetes. Publishing a vulnerability to a public forum before the flaw can be patched could cause it to be exploited by threat actors before it can be fixed.

Anyone who discovers a vulnerability in Kubernetes should follow the community's guidelines for reporting security flaws. By doing that, the Kubernetes security team has time to evaluate the impact of the vulnerability, request a CVE, and coordinate the development, release, and disclosure of the fix.

The subpath vulnerability

The problem the researcher discovered with Kubernetes, simply put, is that the system doesn't check to see if the path specified by subpath leads to a valid file inside a volume. Worse yet, subpath could be set to a symbolic link, or symlink, that points to a file or directory outside the container that points to a container outside the volume and use that to access the file system of an entire host.

Using that technique, a pod's security policy can be bypassed, including the "AllowedHostPaths" attribute, which is used to limit a volume's access to directories on a host.

We began to work on a patch for the subpath flaw in secrecy. It was important that a fix be developed quickly and that, while in development, the vulnerability not be exploited in the wild.

How the vulnerability works

Inside a volume, a user can create a subdirectory that is specified as a subpath. On the host machine, the host path might look like this, where data1 is the subdirectory:

/var/lib/kubelet/pods/<uid>/volumes/kubernetes.io~empty-dir/my-volume/data1

However, if instead of a regular directory a symlink is created, Kubernetes can be tricked into resolving the symlink on the host. So if a symlink to "/" was created, the root directory for the host would be made available to the container. That kind of information would be a goldmine for an intruder performing reconnaissance for an attack.

Fixing the problem

As we began to craft a fix for the vulnerability, our goal was to make sure that any resolution of subpath would return a result that remained inside the volume. So if after all symlinks are resolved, this path:

/var/lib/kubelet/pods/<uid>/volumes.kuberneteas.io~empty-dir/my-volume/data1

is a subdirectory under:

/var/lib/kubelet/pods/<uid>/volumes.kuberneteas.io~empty-dir/my-volume/

Then we thought we would have our solution.

What we realized, though, was that even after validating that a resolved path ended up inside the volume, a malicious actor could still change the subdirectory back to a symlink, exploiting a race condition between the validation step and the handoff to the container runtime.

That meant we needed to lock the directory between the validation and container runtime handoff. We found we could do that using a bind-mount approach.

The bind-mount approach

Bind mounts are a linked copy of a directory tree. Once they are created, they can't be altered by symlinks to point somewhere else.

The logic behind the bind-mount solution was similar to our earlier efforts. In the subpath, all symlinks would be resolved, and the resolved path needed to remain inside the base volume. But before the path was given to the container runtime, it would be bind-mounted to a safe place to prevent user interference before the container runtime processed it. 

The bind-mount operation looks like this:

$ mount --bind \

/var/lib/kubelet/pods/<uid>/volumes/kubernetes.io~empty-dir/my-volume/data1 \

/var/lib/kubelet/safe/place

However, while this approach secures the step between bind-mount and the container runtime, it doesn't secure the steps between resolution of the symlinks and validation of the path resolution, or between path validation and the bind mount to a safe place. A user could still exploit these race conditions and change the subpath to a symlink that breaks out of the base volume in a container.

Linux and Windows solutions

Our final solution was more complicated than we originally planned and came in two flavors: one for Linux, the other for Windows.

The Linux version works like this: First, we resolve all the symlinks in the subpath.

Then, starting with the base volume, we open each path segment one by one, using the openat() syscall, and disallow symlinks. In addition, each path segment is validated to be within our base volume.

Next, we bind-mount the opened file descriptor at /proc/<kubelet pid>/fd/<final fd> to a working directory under the kubelet's pod directory. The proc file links to the opened file. (Even if the file gets replaced while kubelet has it open, the link will still point to the original file.)

Finally, we close the file descriptor and pass the bind mount to the container runtime.

The Windows solution also begins by resolving all symlinks in the subpath.

Each path segment is opened, too, but with a file lock. Symlinks are disallowed, and each segment is checked to make sure its current path is within the base volume.

The resolved subpath then goes to container runtime. After the container is started, then the files are unlocked and closed. 

Meeting our goals

Although our final solution was more complicated than we originally envisioned, it met our goals for the fix:

  • It resolves subpaths and makes sure they point to paths inside a volume.
  • It protects the subpath host path from tampering while Kubernetes and the container runtime are processing it.
  • It's generic enough to support all volume types supported in Kubernetes.

Once the patch was completed, it was given to third-party Kubernetes vendors to privately test under embargo before it was publicly disclosed. That served two purposes: It added more reviewers for the patch, and it got the patch in the pipeline so these vendors could roll it out to customers immediately, once it was announced in March 2018.

While this patch addresses this particular path vulnerability, it's always a good idea to be extra cautious when handling untrusted paths in your applications, as well as to set restrictive policies in your Kubernetes clusters and have multiple layers of security.

To learn more about how we discovered, patched, and rolled out a fix to this Kubernetes vulnerability, come to our session at KubeCon + CloudNativeCon, December 10-13 in Seattle, Washington. We will be speaking on December 11 at 10:50 am.

Keep learning

Read more articles about: SecurityApplication Security