Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions

Objective: Understand the architectural and ABI-level differences between x86 and x64 Windows shellcode, including the Microsoft x64 calling convention, shadow space, stack alignment, position-independent API resolution via PEB walking, and the detection surface each technique exposes.


1. From x86 to x64: What Actually Changed

Moving shellcode from x86 to x64 Windows is not a syntactic exercise of renaming EAX to RAX. The ABI changed, the segment register that anchors the TEB changed, and the addressing model changed. A snippet that “looks right” can execute cleanly, corrupt the host process, and crash three calls later inside an SSE instruction — none of which gives the author an obvious clue.

Itemx86x64
General-purpose registers8 × 32-bit (EAXEDI)16 × 64-bit (RAXR15)
Windows calling conventionstdcall / cdecl — all args on stackUnified fast-call — first 4 integer args in registers
TEB segment registerFS; PEB at fs:[0x30]GS; PEB at gs:[0x60]
Address width32-bit64-bit (48-bit canonical VA in practice)
call pushes4-byte return address8-byte return address
RIP-relative addressingNot availableAvailable; lea rax, [rip + offset] is idiomatic in PIC

Two consequences dominate the rest of this tutorial. First, x64 adopts a single __fastcall-style ABI with a mandatory shadow space and 16-byte stack alignment rule. Second, the TEB is reached via GS, not FS, and every PEB offset must be updated for the 64-bit struct layout.


2. The Microsoft x64 ABI Deep-Dive

The Microsoft x64 calling convention passes the first four integer arguments in registers and floating-point arguments in the low halves of the first four XMM registers. Anything beyond that goes on the stack, above the shadow space, pushed right-to-left.

Argument #Integer RegisterFloating-Point Register
1stRCXXMM0L
2ndRDXXMM1L
3rdR8XMM2L
4thR9XMM3L
5th+Stack (above shadow space)Stack

The return value lives in RAX for integers and pointers, and in XMM0 for floating-point results.

Volatile vs Non-Volatile Registers

ClassRegisters
VolatileRAX, RCX, RDX, R8, R9, R10, R11, XMM0XMM5
Non-volatileRBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, XMM6XMM15

A callee may freely destroy volatile registers; non-volatile registers must be preserved across calls. Shellcode that clobbers RBX or RDI in the host thread and then returns control corrupts the host. This is the single most common reason “working” shellcode crashes the host process several instructions after the shellcode finishes.

Side-by-Side: x86 Push vs x64 Register Load

; --- x86 stdcall: MessageBoxA(0, "msg", "title", 0) ---
push 0              ; uType
push title          ; lpCaption
push msg            ; lpText
push 0              ; hWnd
call [MessageBoxA]  ; callee cleans the stack

; --- x64 fastcall: same call ---
xor  rcx, rcx                       ; hWnd      = NULL
lea  rdx, [rel msg]                 ; lpText
lea  r8,  [rel title]               ; lpCaption
xor  r9d, r9d                       ; uType     = 0
sub  rsp, 0x28                      ; shadow space + alignment (see §4)
call [rel MessageBoxA]
add  rsp, 0x28

Note xor r9d, r9d rather than xor r9, r9 — writing to the 32-bit sub-register zero-extends to the full 64-bit register and produces a shorter, null-byte-free opcode.


Diagram showing the Microsoft x64 calling convention: arguments flow through RCX, RDX, R8, R9, then onto the stack, with the return value in RAX.
The Microsoft x64 ABI passes the first four integer arguments in registers; additional arguments land on the stack above shadow space.

3. Shadow Space: Why, What, and Where

In the Microsoft x64 convention the caller must reserve 32 bytes (4 × 8) of stack immediately above the return address as shadow space (also called home space or spill space). This area exists so the callee has somewhere to spill RCX, RDX, R8, and R9 back to memory if it needs to take their addresses or free up the registers for re-use.

Critical points:

  • Shadow space is always reserved, even when the callee takes fewer than four arguments and even when the callee never spills.
  • It is owned by the caller. The callee may overwrite it without saving the previous contents.
  • The caller does not zero or initialise it. The callee is responsible for whatever it writes there.
  • Stack arguments beyond the fourth begin at [RSP + 0x28] (32 bytes shadow + 8 bytes return address).
Layout immediately after call, before callee prologueOffset from RSP
Return address (pushed by call)[RSP + 0x00]
Shadow slot for RCX[RSP + 0x08]
Shadow slot for RDX[RSP + 0x10]
Shadow slot for R8[RSP + 0x18]
Shadow slot for R9[RSP + 0x20]
5th argument (if any)[RSP + 0x28]

Skip the shadow allocation and the first thing the callee does — often a mov [rsp+8], rcx early in a Win32 prologue — clobbers your own stack frame or, worse, the saved return address you just pushed.


Stack layout diagram showing the mandatory 32-byte shadow space between the return address and stack arguments in the Microsoft x64 calling convention.
The caller must always reserve 32 bytes of shadow space directly above the return address, with additional stack arguments starting at RSP+0x28.

4. Stack Alignment in Practice

The Microsoft x64 ABI requires RSP to be 16-byte aligned at the moment of a call, except inside a prolog. The hardware call then pushes an 8-byte return address, so on entry to the callee RSP is 16N + 8 aligned. Win32 internals (memcpy, CRT, anything that uses SSE/AVX with aligned moves) will issue movaps / movdqa against stack locations and will raise EXCEPTION_ACCESS_VIOLATION (0xC0000005) if RSP is wrong by 8.

This is why the canonical shellcode prologue is sub rsp, 0x28, not 0x20:

  • 0x20 (32 bytes) for shadow space.
  • + 0x08 to undo the misalignment the preceding call introduced.
; Canonical shellcode call wrapper
sub rsp, 0x28          ; 32B shadow + 8B realign
call rax               ; rax = resolved API address
add rsp, 0x28

When the shellcode entry itself was reached by a jump from unknown context, force alignment explicitly:

; Defensive entry: align RSP regardless of caller state
and rsp, 0xFFFFFFFFFFFFFFF0   ; force 16-byte alignment
sub rsp, 0x28                  ; shadow + 8 to keep call-time alignment

To diagnose alignment faults in WinDbg, dump the faulting instruction (u .) and check whether it is a movaps / movdqa referencing [rsp+…]. If rsp & 0xF == 0x8 at the call, you forgot the + 0x08.


5. Position-Independent Code Fundamentals

Shellcode does not know where it will land. Hard-coded addresses are forbidden — ASLR randomises module bases per boot, and the shellcode itself is dropped at an allocator-chosen address. Two x64 idioms enable position independence:

  • RIP-relative addressing. lea rax, [rel label] resolves to lea rax, [rip + disp32] and produces correct results regardless of load address. This is the preferred way to reference embedded data in x64 shellcode.
  • call/pop delta trick. A call to the next instruction pushes its return address — the runtime location of the following label. The callee pops it into a register to obtain a base for subsequent offsets.
; Obtain the runtime address of `data` without RIP-relative encoding
    call get_rip
get_rip:
    pop rbx                  ; rbx = address of next instruction
    lea rsi, [rbx + data - get_rip]
    jmp continue
data:
    db "kernel32.dll", 0
continue:

In practice, prefer lea reg, [rel label] for clarity; reach for call/pop only when an encoder demands it (for example, to avoid certain bad bytes).


6. PEB Walking: Finding kernel32.dll Without Imports

Because shellcode has no import table, it must walk the loader’s in-memory bookkeeping to find kernel32.dll and then resolve GetProcAddress / LoadLibraryA from its exports. On x64 Windows the chain starts at GS and uses these offsets:

StepSourceFieldOffset (x64)
1GS segmentTEB
2TEBProcessEnvironmentBlock+0x060
3PEBLdrPEB_LDR_DATA+0x018
4PEB_LDR_DATAInMemoryOrderModuleList+0x020
5LDR_DATA_TABLE_ENTRY linkInMemoryOrderLinks.Flink+0x000
6LDR_DATA_TABLE_ENTRYDllBase (from InMemoryOrderLinks)+0x030

The InMemoryOrderModuleList on a normal process begins with the executable, then ntdll.dll, then kernel32.dll. Walking two Flinks from the head reaches the kernel32.dll entry. Production-grade shellcode hashes the BaseDllName string rather than trusting that order, both for resilience and because EDRs deliberately permute the head of the list as a tripwire (see §10).

; --- PEB walk skeleton: locate kernel32.dll base in rax ---
    xor   eax, eax
    mov   rbx, [gs:0x60]        ; TEB -> PEB
    mov   rbx, [rbx + 0x18]     ; PEB -> Ldr (PEB_LDR_DATA)
    mov   rbx, [rbx + 0x20]     ; -> InMemoryOrderModuleList.Flink
                                ;    (points into 1st LDR_DATA_TABLE_ENTRY's InMemoryOrderLinks)
    mov   rbx, [rbx]            ; advance: -> 2nd entry (ntdll)
    mov   rbx, [rbx]            ; advance: -> 3rd entry (kernel32)
    mov   rax, [rbx + 0x30]     ; DllBase relative to InMemoryOrderLinks (x64)
                                ; rax now holds kernel32.dll base address

To verify the offsets against the target OS build, drop into WinDbg on a live process and dump the structures directly:

0:000> dt nt!_TEB ProcessEnvironmentBlock
0:000> dt nt!_PEB Ldr
0:000> dt nt!_PEB_LDR_DATA InMemoryOrderModuleList
0:000> dt nt!_LDR_DATA_TABLE_ENTRY DllBase BaseDllName
0:000> !lmi kernel32

Flow diagram tracing the PEB walk from GS register through PEB_LDR_DATA and InMemoryOrderModuleList to locate kernel32.dll base address.
Shellcode reaches kernel32.dll by following two Flink pointers from the InMemoryOrderModuleList head anchored at GS:[0x60].

7. Parsing the Export Address Table

With kernel32.dll‘s base in hand, the shellcode walks the PE headers to the Export Directory and then iterates AddressOfNames, comparing each name against a precomputed hash. String literals like "GetProcAddress" are avoided to defeat trivial signatures and to remove embedded nulls.

Key offsets from a loaded module base:

FieldOffset
e_lfanew (RVA of PE header)DllBase + 0x3C
Optional HeaderPE_header + 0x18
Export Directory RVA (PE32+)OptHeader + 0x70
AddressOfFunctionsExportDir + 0x1C
AddressOfNamesExportDir + 0x20
AddressOfNameOrdinalsExportDir + 0x24
; --- EAT walk outline: resolve an export by ROR-13 name hash ---
; in : rax = module base, ebp = target hash (e.g. for "GetProcAddress")
; out: rax = exported function address (or 0)

    mov   ecx, [rax + 0x3C]      ; e_lfanew
    add   rcx, rax               ; rcx = PE header
    mov   edx, [rcx + 0x88]      ; Export Directory RVA (OptHdr + 0x70)
    add   rdx, rax               ; rdx = IMAGE_EXPORT_DIRECTORY
    mov   r8d,  [rdx + 0x18]     ; NumberOfNames
    mov   r9d,  [rdx + 0x20]     ; AddressOfNames RVA
    add   r9, rax
    xor   r10, r10               ; index

.next_name:
    mov   esi, [r9 + r10*4]      ; name RVA
    add   rsi, rax               ; rsi -> ASCII export name
    xor   edi, edi               ; hash accumulator

.hash_byte:
    movzx eax, byte [rsi]
    test  al, al
    jz    .check
    ror   edi, 13
    add   edi, eax
    inc   rsi
    jmp   .hash_byte

.check:
    cmp   edi, ebp               ; compare ROR-13 hash
    je    .found
    inc   r10
    cmp   r10d, r8d
    jb    .next_name
    xor   rax, rax               ; not found
    ret
.found:
    ; resolve via AddressOfNameOrdinals + AddressOfFunctions
    ; (omitted for brevity)
    ret

The ROR-13 rotate-and-add hash, popularised by the Metasploit block_api stub, is the de facto standard precisely because defenders now key on it (see §10).


8. Null-Byte and Bad-Character Avoidance

Shellcode delivered through a string-copy primitive (strcpy, lstrcatA, format-string echo) is truncated at the first null byte. x64 immediates routinely embed nulls because most useful constants and addresses do not occupy all 64 bits.

ProblemFix
mov rax, 0x000000007FFE1234 → nullsxor eax, eax then mov eax, 0x7FFE1234 (zero-extends)
64-bit literal in mov r9, imm64lea r9, [rel label] or build via shifts/ORs
push 0 → encodes 6A 00xor rcx, rcx ; push rcx
mov rcx, 0 → 7-byte null runxor ecx, ecx
; --- Null-byte comparison ---
; BAD: mov rax, 0x76ab1234
;   48 B8 34 12 AB 76 00 00 00 00   <-- four null bytes
mov rax, 0x76ab1234

; GOOD: zero-extend via 32-bit sub-register
;   31 C0                            <-- xor eax, eax
;   B8 34 12 AB 76                   <-- mov eax, 0x76AB1234
xor eax, eax
mov eax, 0x76ab1234

Writing to EAX implicitly zeroes the upper 32 bits of RAX — this single architectural quirk eliminates most accidental nulls in shellcode constants.

A short Python lab to validate a candidate snippet:

from keystone import Ks, KS_ARCH_X86, KS_MODE_64

asm = b"""
    xor eax, eax
    mov eax, 0x76ab1234
    mov rbx, qword ptr gs:[0x60]
    mov rbx, qword ptr [rbx + 0x18]
"""
ks = Ks(KS_ARCH_X86, KS_MODE_64)
code, _ = ks.asm(asm)
buf = bytes(code)
print(buf.hex())
bad = [i for i, b in enumerate(buf) if b == 0x00]
print(f"length={len(buf)} bad_byte_offsets={bad}")

Run it, see exactly where nulls (or any other bad character) land, and rewrite the offending instruction.


9. Shellcode Skeleton: Putting It Together

The pieces combine into a recognisable x64 stub: align the stack, walk the PEB to find kernel32.dll, parse the EAT to resolve GetProcAddress and LoadLibraryA, and then call out through the standard ABI with proper shadow space.

[BITS 64]
_start:
    ; --- entry: defensively align stack ---
    and   rsp, 0xFFFFFFFFFFFFFFF0
    sub   rsp, 0x28                ; shadow space + alignment

    ; --- locate kernel32.dll via PEB ---
    mov   rbx, [gs:0x60]           ; TEB -> PEB
    mov   rbx, [rbx + 0x18]        ; PEB -> Ldr
    mov   rbx, [rbx + 0x20]        ; InMemoryOrderModuleList.Flink
    mov   rbx, [rbx]               ; -> ntdll entry
    mov   rbx, [rbx]               ; -> kernel32 entry
    mov   r15, [rbx + 0x30]        ; r15 = kernel32 base

    ; --- resolve GetProcAddress via ROR-13 hash (call into eat_lookup) ---
    mov   rcx, r15
    mov   edx, 0x7C0DFCAA          ; ROR-13("GetProcAddress")  (illustrative)
    call  eat_lookup               ; rax = &GetProcAddress
    mov   r14, rax

    ; --- call LoadLibraryA("user32.dll") via GetProcAddress ---
    mov   rcx, r15                 ; hModule = kernel32
    lea   rdx, [rel s_LoadLibraryA]
    call  r14                      ; rax = &LoadLibraryA
    lea   rcx, [rel s_user32]
    call  rax                      ; rax = HMODULE user32

    ; --- ... continue resolution and API calls ...

    add   rsp, 0x28
    ret

s_LoadLibraryA: db "LoadLibraryA", 0
s_user32:       db "user32.dll", 0

; eat_lookup: in rcx=module base, edx=ROR13 hash -> rax = export addr
eat_lookup:
    ; (see §7 for the inner loop)
    ret

Every block in the skeleton corresponds to one of the rules established above: sub rsp, 0x28 for shadow + alignment, gs:[0x60] for the PEB, [rbx + 0x30] for DllBase, lea + RIP-relative strings for PIC, and r14 / r15 carrying non-volatile state across calls without manual save/restore.


10. Common Attacker Techniques

TechniqueDescription
PEB-walk API resolutionLocate kernel32.dll via gs:[0x60] chain, parse exports by hash
ROR-13 export hashingAvoid embedded API name strings; survive static signature scans
RIP-relative PIClea reg, [rel label] to address embedded data without fixups
Sub-register zero-extensionmov eax, imm32 to write RAX with no null bytes
Shadow-space-aware call wrappingsub rsp, 0x28 around every Win32 call from an unknown caller
Direct Win32 → Native API substitutionCall Nt* syscalls to bypass usermode hooks (T1106)
Reflective loading of a PE in memoryShellcode bootstraps a full PE image without touching disk (T1620)

11. Defensive Strategies & Detection

Shellcode is observable at multiple layers. The most reliable signals come from the behaviours the techniques above require, not from the byte patterns they happen to produce.

Sysmon events to enable and triage:

  • EventID 1 — Process Create. Unusual parent/child chains (browser, Office, mail client spawning cmd.exe / powershell.exe) are the cheapest, highest-yield signal.
  • EventID 8CreateRemoteThread. Cross-process thread creation into LSASS, browsers, or signed Windows binaries is high-fidelity.
  • EventID 10ProcessAccess. Watch GrantedAccess masks like 0x1FFFFF (full access) and 0x1010 (read + VM-write).
  • EventID 17 / 18 — Pipe creation/connection, frequently used by shellcode-launched implants for C2.

ETW providers worth subscribing to in EDR pipelines:

  • Microsoft-Windows-Kernel-Process — kernel-side process/thread/image events.
  • Microsoft-Windows-Threat-Intelligence (PPL-only) — NtAllocateVirtualMemory, NtProtectVirtualMemory, NtWriteVirtualMemory, NtCreateThreadEx at the syscall layer, bypassed by no usermode hook.
  • Microsoft-Windows-Security-Auditing — handle and object access.

Audit policies: Audit Process Creation (Success) and Audit Kernel Object surface the same events to the classic Security log for SIEM ingestion.

Behavioural signals defenders should hunt on:

  • Threads with StartAddress in MEM_PRIVATE regions that are PAGE_EXECUTE_* and not backed by a file image.
  • CallTrace containing UNKNOWN frames — the calling instruction lives in unbacked memory.
  • gs:[0x60] opcode pattern (65 48 8B 04 25 60 00 00 00) inside executable regions of non-system modules.
  • ROR-13 hashing loops in memory scans.

Sigma sketch — suspicious cross-process access typical of shellcode injection:

title: Suspicious Cross-Process Access With VM-Write Rights
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 10
    GrantedAccess:
      - '0x1FFFFF'
      - '0x1410'
      - '0x1010'
  filter_legit:
    SourceImage|endswith:
      - '\MsMpEng.exe'
      - '\WmiPrvSE.exe'
  condition: selection and not filter_legit
level: high

Hardening to deploy on monitored endpoints:

  • Arbitrary Code Guard (ACG) — denies the PAGE_EXECUTE_* transition that turns a MEM_PRIVATE shellcode buffer into runnable code.
  • Control Flow Guard (CFG) — invalidates indirect calls into unregistered targets, which shellcode entry points always are.
  • Block Win32 API calls from Office macros / child processes — Attack Surface Reduction rule that severs the most common shellcode delivery vector.
  • PPL-protected EDR with kernel ETW Ti subscription — preserves syscall-layer telemetry even when userland hooks are patched out.

A useful EDR tripwire is to permute the head of InMemoryOrderModuleList with stub entries: shellcode that walks two Flinks blindly resolves the decoy module, fails to find expected exports, and crashes — producing a high-fidelity detection.


12. Tools for x64 Shellcode Analysis

ToolDescriptionLink
NASMAssembler for the snippets in this tutorial; emits raw binary for direct hex inspectionnasm.us
Keystone EngineProgrammatic assembler (Python bindings) for bad-character analysis labskeystone-engine.org
x64dbgUser-mode debugger; trace shellcode through gs:[0x60] and EAT walksx64dbg.com
WinDbgInspect _TEB, _PEB, _PEB_LDR_DATA, _LDR_DATA_TABLE_ENTRY on the target buildlearn.microsoft.com
Ghidra / IDAStatic analysis of shellcode-bearing samples and reflective loader stubsghidra-sre.org
Volatility 3Memory forensics: enumerate suspicious MEM_PRIVATE + RX regions, hunt unbacked threadsvolatilityfoundation.org
Process HackerLive triage of thread start addresses and memory protectionsprocesshacker.sourceforge.io
Godbolt Compiler ExplorerInspect MSVC-emitted x64 prologues to confirm ABI assumptionsgodbolt.org

13. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Process Injection (umbrella)T1055Sysmon EventID 8 + EventID 10 with VM-write GrantedAccess
DLL InjectionT1055.001Image Load (EventID 7) from MEM_PRIVATE-allocated path
Portable Executable InjectionT1055.002Volatility scans for PE headers in MEM_PRIVATE RX regions
APC InjectionT1055.004ETW Ti NtQueueApcThread to remote thread; alerted thread-start addresses
Process HollowingT1055.012EventID 1 with suspended child, followed by EventID 10 write + resume
Native APIT1106ETW Ti syscall provider; direct Nt* calls outside ntdll
Obfuscated Files or InformationT1027YARA on ROR-13 loops; entropy heuristics on dropped payloads
Reflective Code LoadingT1620Unbacked RX memory with PE magic / no module image record

Summary

  • x64 Windows shellcode is governed by a strict ABI: argument registers RCX/RDX/R8/R9, return in RAX, a 32-byte shadow space, and 16-byte stack alignment at every call.
  • The TEB is reached via gs:[0x60] on x64; every PEB offset (+0x18, +0x20, +0x30) differs from the x86 layout and must be verified against the target build.
  • Position-independent API resolution combines a PEB walk to kernel32.dll with an EAT walk using ROR-13 name hashing to avoid embedded strings.
  • Null-byte avoidance leans on 32-bit sub-register writes that zero-extend, RIP-relative lea, and XOR-then-push idioms.
  • Detection is layered: Sysmon EventID 8/10 for injection chains, ETW Threat-Intelligence for syscall-level memory writes, behavioural hunts for unbacked RX regions, and ACG/CFG/ASR hardening to deny the primitives shellcode depends on.

Related Tutorials

References

Understanding the Stack: Frames, Prologue/Epilogue, and Stack Layout

Objective: Understand how the call stack is organized in x86 and x64 Windows processes — the mechanics of stack frames, function prologue/epilogue sequences, calling conventions, shadow space, and the exact memory layout a debugger reveals — so you can recognize a healthy stack versus a corrupted one and reason precisely about stack-based exploitation and its defenses.


1. Why the Stack Matters for Exploit Development

The stack is the primary battleground for classic memory-safety bugs. Saved return addresses, saved frame pointers, function arguments, and fixed-size local buffers all live side by side on the same contiguous, downward-growing region. When a write runs past the end of a stack buffer, it corrupts the very control-flow data the CPU will trust on the next RET.

For a defender, the same knowledge is diagnostic. A return address pointing into the stack or heap instead of an executable image, an RSP value that jumped thousands of bytes (a stack pivot), or a frame chain that no longer links cleanly are all signatures of corruption. You cannot recognize an abnormal stack until you have internalized a normal one.


2. The Stack as a Data Structure: Growth Direction and Address Space Layout

A Windows process virtual address space holds the mapped image (.text, .data), loaded DLLs, the heap, thread stacks, and per-thread/per-process control structures (TEB/PEB). Each thread receives its own stack, reserved and committed on demand.

The stack grows downward — toward lower addresses. PUSH decrements the stack pointer; POP increments it. The live top of the stack is always tracked by RSP (x64) / ESP (x86).

RegisterRole
RSP / ESPStack pointer — always points to the top (lowest address) of the current frame
RBP / EBPBase/frame pointer — anchors the frame in x86; in x64 not used for locals/args unless alloca() is used
RIP / EIPInstruction pointer — saved as the return address by CALL
RAXInteger/pointer return value (XMM0 for floating-point)

3. x86 Stack Frames: Registers, Calling Conventions, and the EBP Chain

32-bit Windows supports several co-existing calling conventions, which is why x86 reversing requires you to identify the convention before reading arguments.

ConventionCleanupArgument Passing
__cdeclCaller cleansRight-to-left on stack
__stdcallCallee cleansRight-to-left on stack (Win32 API)
__fastcallCallee cleansFirst two in ECX/EDX, rest on stack
__thiscallCallee cleansC++ this in ECX, args on stack

x86 code conventionally uses EBP as a fixed frame anchor. Every local and argument is addressed relative to it, and each saved EBP points at the caller’s saved EBP, forming a walkable frame chain.

// MSVC x86, compiled /Od (no optimization)
void vuln(char *src) {
    char buf[64];      // local buffer — classic overflow target
    strcpy(buf, src);  // bounded only by src
}
; x86 frame for vuln(), high → low address
push ebp            ; save caller's EBP
mov  ebp, esp       ; EBP anchors this frame
sub  esp, 64        ; allocate buf[64]
; ... strcpy ...
; [EBP + 8]  -> arg1 (src)
; [EBP + 4]  -> return address   ← ret-overwrite target
; [EBP + 0]  -> saved EBP        ← frame chain link
; [EBP - 64] -> buf              ← overflow origin

A buffer overflow that walks upward from [EBP-64] crosses the saved EBP, then the return address — the two values the epilogue and RET consume.


Diagram showing the x86 stack frame layout from higher to lower addresses: function arguments, return address, saved EBP, local variables, and the buffer at the top of ESP
A typical x86 stack frame: overflowing the buffer at [EBP-N] walks upward through locals, corrupting saved EBP and then the return address.

4. x64 Stack Frames: The Windows ABI and Shadow Space

The Windows x64 ABI consolidates every x86 convention into a single calling convention. The first four integer or pointer parameters pass in RCX, RDX, R8, R9; the first four floating-point parameters in XMM0XMM3. Additional arguments spill onto the stack.

Two rules dominate the x64 layout:

  • Shadow space (home space): The caller allocates 32 bytes immediately above the return address, regardless of how many parameters are actually used. The callee may dump RCX/RDX/R8/R9 into this home space if it needs to spill them.
  • 16-byte alignment: RSP must be 16-byte aligned at a CALL. Because CALL pushes an 8-byte return address, RSP is 16n+8 before the call and 16n-aligned on entry to the callee.

Critically, x64 functions typically address locals and arguments RSP-relative, leaving RSP constant for the body of the function. RBP is freed for general use unless alloca() is present.

[High address — caller's frame]
  Stack arg 5+      ← [RSP + 0x28+]
  Shadow [R9]       ← [RSP + 0x20]
  Shadow [R8]       ← [RSP + 0x18]
  Shadow [RDX]      ← [RSP + 0x10]
  Shadow [RCX]      ← [RSP + 0x08]   (relative to callee entry)
  Return Address    ← [RSP + 0x00]   ← ret-overwrite target
  Local variables   ← [RSP - N]
[Low address — grows downward]

Diagram of the x64 Windows ABI stack layout showing extra arguments, 32-byte shadow space, return address, saved non-volatile registers, and local variables down to RSP
The x64 Windows ABI reserves 32 bytes of shadow space above the return address; RSP remains constant through the function body for RSP-relative addressing.

5. Volatile vs. Non-Volatile Registers and Leaf Functions

The x64 convention splits the register file into volatile (caller-saved) and non-volatile (callee-saved). A function that clobbers a non-volatile register must save and restore it in its prologue/epilogue.

ClassRegisters
Volatile (caller-saved)RAX, RCX, RDX, R8R11, XMM0XMM5
Non-volatile (callee-saved)RBX, RBP, RDI, RSI, R12R15, XMM6XMM15

A leaf function changes no non-volatile register (including not altering RSP by calling out). A non-leaf function calls another function — which adjusts RSP — and therefore must establish a frame and register unwind data. This distinction drives whether the compiler emits a prologue and .pdata entry at all.


6. Prologue and Epilogue Deep Dive

The prologue establishes the frame: save callee-saved registers and reserve local space. The epilogue reverses it and returns.

; x86 epilogue
mov  esp, ebp      ; free locals
pop  ebp           ; restore caller's EBP
ret                ; pop return address → EIP

LEAVE is a single instruction equivalent to mov esp, ebp + pop ebp, available on both x86 and x64.

; x64 MASM (ml64) non-leaf frame
sub  rsp, 0x28     ; 0x20 shadow + 8 align pad
; ... body uses [rsp+0x..] for locals/spills ...
add  rsp, 0x28     ; deallocate
ret                ; pop return address → RIP

Many optimized x64 functions omit push rbp entirely and address everything from RSP. Frame Pointer Omission (FPO) saves two instructions and frees RBP as a general register; GCC/Clang do this by default at -O2, and MSVC does similarly with /O2. For exploitation this matters: without a frame pointer there is no [EBP+4] anchor for the return address — offsets must be computed from RSP at a known instruction.

__declspec(noinline) int callee(int a, int b, int c, int d) {
    int local = a + b + c + d;   // forces a real frame + homing
    return local;
}
int caller(void) { return callee(1, 2, 3, 4); }

Compile this on Godbolt or step it in WinDbg to watch RCX/RDX/R8/R9 home into shadow space.


7. Unwind Data and Structured Exception Handling

x64 Windows requires every non-leaf function to register unwind data in the PE .pdata and .xdata sections so the OS can walk frames during structured exception handling. Each function publishes a RUNTIME_FUNCTION and an associated UNWIND_INFO that describes the prologue.

typedef struct _RUNTIME_FUNCTION {
    ULONG BeginAddress;
    ULONG EndAddress;
    ULONG UnwindData;   // RVA to UNWIND_INFO
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;

RtlVirtualUnwind() consumes this data to reconstruct caller frames without a frame pointer. For defenders, intact, parseable unwind data is what lets EDR and crash tooling produce a reliable call stack; ROP chains and stack pivots frequently produce stacks that fail to unwind cleanly — itself a detectable anomaly.


8. Reading Stack Frames in a Debugger

In WinDbg or x64dbg you read the live frame directly off RSP.

bp mymodule!vuln        ; break at the function
g                       ; run to it
dps rsp L10             ; dump 16 pointer-sized stack slots
r rsp, rbp, rip         ; show live pointers
k                       ; walk the call stack (uses unwind data)

dps rsp L10 prints the raw stack; the slot at [RSP+0x08] after entry (or the top after the prologue) holds the saved return address, which k resolves to module!function+offset. A return address that resolves to no module — or to the stack itself — is the first sign of a hijacked frame.


9. How Stack Overflows Corrupt Frame Integrity

Overflowing a fixed local buffer writes past its bounds toward higher addresses, in the direction of the saved frame pointer and the return address.

# Conceptual layout arithmetic — NOT a payload.
# 64-byte buffer sitting below the saved return address.
import struct

buf_size      = 64
saved_rbp     = 8          # x86: 4
ret_addr_slot = 8          # x86: 4
offset_to_ret = buf_size + saved_rbp   # bytes before reaching the return slot

print(f"bytes before saved frame ptr: {buf_size}")
print(f"bytes before return address : {offset_to_ret}")

When execution reaches RET, the CPU pops whatever now sits in the return slot into RIP/EIP and jumps there. A controlled overwrite places a valid, attacker-chosen address (a gadget or function); an uncontrolled overwrite leaves garbage, producing an immediate access violation. The distinction matters operationally: uncontrolled corruption crashes loudly (WER dump), while a precise overwrite can transfer control silently — which is exactly why the compiler inserts a guard between the buffer and the return address.


Flow diagram showing how an oversized buffer write sequentially corrupts the GS cookie, saved frame pointer, and return address before RET transfers control to an attacker-chosen address
A stack overflow progresses deterministically from the buffer edge through the GS cookie and saved frame pointer to the return address, hijacking control at the next RET.

10. Modern Mitigations and What They Change About the Layout

Mitigations alter the frame layout or the trust placed in it; none remove the need to understand the stack.

// /GS inserts a cookie between locals and the saved frame data.
void vuln(char *src) {
    char buf[64];
    // prologue: mov rax, __security_cookie; xor rax, rsp; mov [rsp+0x..], rax
    strcpy(buf, src);
    // epilogue: mov rcx, [rsp+0x..]; xor rcx, rsp; call __security_check_cookie
}
MitigationStructural Effect
/GS stack cookie__security_cookie placed between locals and saved return address; mismatch → __report_gsfailure
DEP / NXIMAGE_DLLCHARACTERISTICS_NX_COMPAT; stack pages non-executable, blocking on-stack shellcode
ASLRIMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE; randomizes stack/image base, breaking hardcoded addresses
Control Flow GuardIMAGE_GUARD_CF_INSTRUMENTED; validates indirect call targets
Intel CET Shadow StackCETCOMPAT mitigation; read-only shadow copy of return addresses defeats classic ret-overwrites

11. Common Attacker Techniques

TechniqueDescription
Saved return-address overwriteOverflow a local buffer to replace [RSP+0x08]/[EBP+4] and redirect RET
Saved frame pointer overwriteCorrupt saved RBP/EBP to desynchronize the frame chain or pivot
Stack pivotUse a gadget (xchg rsp, rax; leave; ret) to point RSP at attacker data
ROP chainingDefeat DEP by chaining ret-terminated gadgets via the corrupted stack
SEH overwrite (x86)Corrupt the exception handler chain on the stack to gain control on fault
Off-by-one / frame-pointer overwriteSingle-byte overflow to truncate or shift EBP, shifting subsequent frame math

These primitives all depend on knowing the exact offset from a controllable buffer to the saved control-flow data — which is precisely the layout this tutorial defines.


12. Defensive Strategies & Detection

Detection focuses on the crash artifacts and post-exploitation behavior that stack corruption produces, since the corruption itself is often only visible at the moment of RET.

SignalDetail
Windows Error ReportingAccess violation at abnormal RIP; dumps under %LOCALAPPDATA%\Microsoft\Windows\WER\ReportQueue; Application Event 1000/1001
Sysmon Event ID 1Unusual child process from document/browser renderers (T1203 follow-on)
Sysmon Event ID 10Cross-process stack reads via ReadProcessMemory
Security Event 4672Special privileges to an unexpected logon (T1068 follow-on)
ETW Microsoft-Windows-Kernel-ProcessAnomalous RIP/RSP deltas via call-stack sampling (stack pivot)
ETW Microsoft-Windows-Security-MitigationsEmits events when CFG, DEP, or Shadow Stack violations are blocked

A practical first-line Sigma sketch catches the most common post-exploitation chain — a renderer spawning a shell:

title: Suspicious Child Process From Document Renderer
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 1
    ParentImage|endswith:
      - '\WINWORD.EXE'
      - '\EXCEL.EXE'
      - '\AcroRd32.exe'
    Image|endswith:
      - '\cmd.exe'
      - '\powershell.exe'
      - '\wscript.exe'
  condition: selection
level: high

Hardening checklist: compile with /GS (verify no /GS-), link /NXCOMPAT and /DYNAMICBASE, enable CFG with /guard:cf, turn on CET via SetProcessMitigationPolicy(ProcessUserShadowStackPolicy, ...), enforce /SAFESEH on x86, and configure Windows Defender Exploit Guard for legacy binaries. MITRE mitigation M1050 (Exploit Protection) bundles these OS controls.


13. MITRE ATT&CK Mapping

Stack layout knowledge is foundational rather than a single technique; the mapping below frames it in the defensive direction — recognizing the artifacts each technique produces.

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Sysmon EventID 1 renderer child chains; WER crash dumps
Exploitation for Privilege EscalationT1068Security EventID 4672 unexpected source process
Exploit Public-Facing ApplicationT1190Service crash loops + WER on network-facing daemons
Reflective Code LoadingT1620ETW call-stack anomalies; non-image-backed RIP
Process InjectionT1055Sysmon EventID 8/10; abnormal cross-process access

14. Tools for Stack Analysis

ToolDescriptionLink
WinDbgKernel/user debugging, k, dps, unwind walkingmicrosoft.com
x64dbgLive user-mode stack inspection on x64/x86x64dbg.com
Godbolt Compiler ExplorerView prologue/epilogue and FPO across compilersgodbolt.org
GhidraStatic reconstruction of frames and calling conventionsghidra-sre.org
Process HackerLive thread stacks and call-stack walkingprocesshacker.sourceforge.io
NASMAssemble illustrative prologue/epilogue snippetsnasm.us
GDB + pwndbgCross-platform frame and offset analysisgdb.gnu.org

Summary

  • The stack is a downward-growing region where buffers sit beside the very return address the CPU trusts at RET — which is why it is the primary target of memory-safety exploits.
  • x86 frames anchor on EBP with multiple calling conventions; x64 uses one convention, RCX/RDX/R8/R9 parameters, 32-byte shadow space, 16-byte alignment, and RSP-relative addressing.
  • The prologue saves non-volatile registers and reserves locals; the epilogue (LEAVE/RET) reverses it; frame-pointer omission removes the [EBP+4] anchor and forces RSP-relative offset math.
  • Overflows corrupt saved RBP/EBP and the return address; /GS, DEP, ASLR, CFG, and CET Shadow Stack change the layout’s trust model but not the need to understand it.
  • Detect follow-on activity via WER dumps, Sysmon EventID 1/10, Security 4672, and ETW mitigation/call-stack events, mapped to T1203 and T1068.

Related Tutorials

References

x86 and x64 Calling Conventions: cdecl, stdcall, fastcall, and System V

Objective: Understand how the five major calling conventions — cdecl, stdcall, fastcall, the Microsoft x64 ABI, and the System V AMD64 ABI — dictate argument passing, register ownership, stack cleanup, and alignment, and exactly why those rules determine where return addresses and arguments sit in memory when a vulnerability is triggered.


1. Why Calling Conventions Matter for Exploit Development

A calling convention is the contract between a caller and a callee. It specifies how arguments are passed (stack or registers), where the return value lands, which registers the callee must preserve, and who cleans up the stack. None of this is arbitrary — it is fixed by the ABI for a given platform and compiler.

For a defender or authorized red-teamer, this matters because stack layout is deterministic. When a local buffer overflows, the bytes that land on the saved return address are determined entirely by the convention in force. Reliable overflow payloads, return-to-libc chains, and ROP gadgets all depend on knowing precisely where the return address, arguments, and saved registers sit. Get the convention wrong and your offset math is wrong.


2. Stack Mechanics Refresher: PUSH, POP, CALL, RET

The stack grows downward (toward lower addresses). PUSH decrements the stack pointer (ESP/RSP) and writes; POP reads and increments it.

  • CALL target pushes the return address (the next instruction’s EIP/RIP) onto the stack, then jumps.
  • RET pops that saved address back into the instruction pointer.
  • RET N pops the address and adds N to ESP — this is how a callee cleans caller-pushed arguments.
push arg1          ; arg on stack
call foo           ; pushes return address, jumps to foo
add  esp, 4        ; caller cleans 1 dword arg (cdecl)

Because CALL writes the return address to a predictable slot, any write primitive that reaches that slot redirects control flow. Every convention below differs only in how the arguments around that slot are arranged.


3. x86 cdecl: The C Standard

__cdecl is the default for C functions on 32-bit x86 (MSVC flag /Gd). Arguments are pushed right to left, and the caller cleans the stack. The return value comes back in EAX. C names are decorated with a single leading underscore (_foo), no case translation.

Because the caller cleans up, cdecl is the only x86 convention that supports variadic functions (printf-style va_list) — the callee never needs to know the argument count.

; foo(1, 2, 3);  -- cdecl
push 3             ; rightmost first
push 2
push 1             ; leftmost last
call _foo
add  esp, 12       ; CALLER cleans 3 dwords

Canonical x86 stack frame at function entry (high → low address):

[arg N]          ← pushed last (rightmost)
[arg 2]
[arg 1]          ← pushed first
[return address] ← pushed by CALL
[saved EBP]      ← pushed by prologue (PUSH EBP)
[local vars]     ← ESP after SUB ESP, N

The saved EBP and return address are the primary targets of a stack-based overflow. Overflow a local buffer and you overwrite them in that exact order.


Diagram showing x86 cdecl stack frame from high to low address: last argument, first argument, saved return address, saved EBP, then local buffer where overflow begins
In cdecl, overflowing a local buffer overwrites saved EBP and then the return address in exactly this order — making the offset deterministic.

4. x86 stdcall: The Windows API Convention

__stdcall is the convention for the Win32 API. Arguments still push right to left, but the callee cleans the stack using RET N. This is efficient for fixed-argument functions, but it forbids variadics.

Name decoration encodes the byte count of stack arguments: a leading underscore, an @, then the size in bytes (always a multiple of 4). MessageBoxA with four pointer/int args becomes _MessageBoxA@16.

; foo(1, 2);  -- stdcall, two dword args
push 2
push 1
call _foo@8
; NO add esp here — callee handled it
foo:
    ; ... body ...
    ret 8          ; CALLEE pops 8 bytes of args

For shellcode and custom loaders, the @N suffix matters when resolving and patching the Import Address Table — the decorated name must match the export.


5. x86 fastcall: Register-Based Argument Passing

__fastcall (MSVC flag /Gr) passes the first two integer arguments in ECX and EDX; remaining arguments push right to left, and the callee cleans them. Decoration uses a leading @ (e.g. @foo@8). All __fastcall functions must have prototypes.

; foo(1, 2, 3);  -- MSVC fastcall
mov  ecx, 1        ; arg1 in ECX
mov  edx, 2        ; arg2 in EDX
push 3             ; arg3 on stack
call @foo@12

⚠️ Compiler variance: __fastcall is not standardized across compilers. MSVC uses ECX/EDX. Borland passes the first three arguments in EAX, EDX, ECX. When reversing a non-MSVC binary, verify register usage before trusting any decompiler’s __fastcall label.


6. Microsoft x64 ABI: The Modern Windows Convention

On Windows x64 there is effectively one ABI; the /Gd, /Gr, /Gz flags only exist for x86 targets. The convention is a four-register fastcall:

Argument slotInteger registerFloat register
1RCXXMM0
2RDXXMM1
3R8XMM2
4R9XMM3

Key rules:

  • One-to-one correspondence: each argument maps to exactly one register/slot; a single argument is never split across registers.
  • Any argument larger than 8 bytes, or not sized 1/2/4/8 bytes, is passed by reference.
  • Arguments beyond the first four go on the stack after the shadow space.
  • The stack must be 16-byte aligned before CALL.
  • The x87 stack is unused; all floating-point work uses the 16 XMM registers and is volatile across calls.

Shadow space (home space): the caller must allocate 32 bytes on the stack before the CALL, even if the callee takes fewer than four arguments, and reclaim it afterward. The callee may spill RCX/RDX/R8/R9 into this region.

; foo(a, b, c, d) -- Microsoft x64
mov  rcx, a
mov  rdx, b
mov  r8,  c
mov  r9,  d
sub  rsp, 20h      ; 32 bytes shadow space (caller's job)
call foo
add  rsp, 20h      ; reclaim shadow space

Volatile (caller-saved): RAX, RCX, RDX, R8, R9, R10, R11, XMM4, XMM5.
Non-volatile (callee-saved): RBX, RBP, RDI, RSI, R12R15, XMM6XMM15.


Diagram of Microsoft x64 ABI stack layout showing stack arguments above the mandatory 32-byte shadow space, the saved return address written by CALL, and the callee local frame below, with registers RCX RDX R8 R9 carrying the first four arguments
The mandatory 32-byte shadow space sits between caller stack arguments and the saved return address, shifting buffer-to-RIP offsets by 32 bytes versus an equivalent System V frame.

7. System V AMD64 ABI: The Linux and macOS Convention

System V AMD64 is followed on Linux, macOS, FreeBSD, Solaris, and other POSIX systems. It uses six integer argument registers:

Argument slotInteger registerFloat register
1RDIXMM0
2RSIXMM1
3RDXXMM2
4RCXXMM3
5R8XMM4XMM7 (5–8)
6R9

Additional arguments push onto the stack in reverse order. The return value is in RAX; for 128-bit returns the high 64 bits go in RDX. The stack is 16-byte aligned just before CALL.

  • Callee-saved: RBX, RBP, R12R15. All others are caller-saved.
  • Red zone: the 128 bytes below RSP are reserved and untouched by signal/interrupt handlers. Leaf functions may use this area as their entire frame without adjusting RSP.
  • Syscall variant: kernel entry uses the same registers except R10 replaces RCX (because the syscall instruction clobbers RCX).
  • Varargs: for variadic functions, RAX must hold the number of vector (XMM) registers used, 0–8.
; write(1, buf, len) via syscall -- System V
mov  rax, 1         ; sys_write
mov  rdi, 1         ; fd (arg1)
mov  rsi, buf       ; buffer (arg2)
mov  rdx, len       ; count (arg3)
; NOTE: a syscall uses R10 in place of RCX for arg4
syscall
; leaf function may freely use [rsp-128 .. rsp] (red zone)

⚠️ Shadow space vs. red zone are mutually exclusive and commonly confused. Shadow space (32 bytes above the call) exists only on Windows x64. The red zone (128 bytes below RSP) exists only on System V. Never assume both.


Graph comparing System V AMD64 ABI and Microsoft x64 ABI side by side, highlighting differing argument registers, the System V red zone versus the Microsoft shadow space, and their shared 16-byte alignment requirement
Red zone and shadow space are mutually exclusive per-platform features — conflating them is a classic source of cross-platform shellcode crashes.

8. Side-by-Side Comparison and ABI Detection in Disassembly

PropertyMicrosoft x64System V AMD64
Integer arg registersRCX, RDX, R8, R9RDI, RSI, RDX, RCX, R8, R9
FP arg registersXMM0XMM3XMM0XMM7
Shadow space32 bytes (mandatory)None
Red zoneNone128 bytes below RSP
Callee-savedRBX, RBP, RDI, RSI, R12R15, XMM615RBX, RBP, R12R15

Recognition heuristics in IDA/Ghidra:

  • A sub rsp, 0x20 immediately before CALL and arguments loaded into RCX/RDX/R8/R9Microsoft x64.
  • Arguments loaded into RDI/RSI/RDX and writes into [rsp-8] without a prior sub rspSystem V (red zone).
  • A ret N (non-zero immediate) on 32-bit code ⇒ stdcall or fastcall; arguments in ECX/EDX distinguish fastcall.
  • A bare ret with caller-side add esp, Ncdecl.

Automated ABI detection can misfire on hand-written assembly, non-MSVC fastcall, or -fomit-frame-pointer builds — always confirm against the actual prologue.


9. Calling Conventions as an Attack Surface

Each convention places the return address at a known offset from a local buffer. That offset is the difference between a working and a failing overflow.

In 64-bit binaries, overflowing a buffer controls stack contents, not registers directly — which is exactly why return-oriented programming is needed. To call a libc function on x64 Linux, you must first load the argument register: a pop rdi ; ret gadget sets arg 1 before the call. This is a direct consequence of the System V ABI placing arg 1 in RDI.

On Windows x64, the mandatory 32-byte shadow space shifts the offset from a local buffer to the saved return address by 32 bytes versus an equivalent Linux frame — a classic source of off-by-32 errors in cross-platform shellcode.

A conceptual offset calculator makes the dependency explicit:

def return_addr_offset(buf_size, conv):
    # bytes from start of local buffer to the saved return address
    if conv == "x86_cdecl" or conv == "x86_stdcall":
        return buf_size + 4            # + saved EBP (4 bytes)
    if conv == "sysv_amd64":
        return buf_size + 8            # + saved RBP (8 bytes)
    if conv == "ms_x64":
        return buf_size + 8 + 0x20     # saved RBP + 32B shadow space
    raise ValueError("unknown convention")

Frame-pointer presence (-fomit-frame-pointer removes saved RBP) and shadow space both change the answer — which is why convention awareness precedes any reliable payload.


Flow diagram of a ROP chain on System V AMD64 showing overflow redirecting to a pop-rdi-ret gadget loading arg1 into RDI, then a pop-rsi-ret gadget loading arg2 into RSI, before jumping to a libc function
Every ROP gadget that loads a register is a direct consequence of the ABI — on System V you need pop rdi; ret for arg 1 because the convention mandates RDI, not the stack.

10. Common Attacker Techniques

TechniqueDescription
Saved return-address overwriteOverflow a local buffer to clobber the convention-determined return slot
Return-to-libc (x86)Stack-arranged args (cdecl) let an attacker call system() without shellcode
ROP register loading (x64)Use pop rdi ; ret / pop rcx ; ret gadgets to satisfy the ABI before a call
Shadow-space-aware stack pivotAccount for the 32-byte home space when chaining Windows x64 gadgets
IAT patching via decorationResolve _func@N decorated stdcall imports for shellcode loaders
Reflective API callsManually set up RCX/RDX/R8/R9 + shadow space before invoking LoadLibraryA

Reflective loaders and injected shellcode must respect the target ABI exactly — wrong argument registers or a missing shadow allocation crashes the call.


11. Defensive Strategies & Detection

Note: A calling convention is a compile-time/binary property — no Sysmon Event ID fires because a convention is used. Detection is indirect: it triggers on the runtime artifacts of a convention-aware exploit.

Compile-time mitigations motivated directly by convention layout:

  • Stack canaries/GS (MSVC), -fstack-protector-strong (GCC/Clang) detect return-address overwrite before RET.
  • Control Flow Guard/guard:cf validates indirect CALL targets.
  • Intel CET / Shadow Stack — hardware enforces that RET pops the address CALL pushed, directly countering return-address overwrites. Mark binaries with IMAGE_DLLCHARACTERISTICS_GUARD_CET_COMPAT (0x4000).
  • ASLR + PIE — randomizes addresses so known layout still yields unknown absolute targets.
  • -mno-red-zone — hardens Linux kernel modules against red-zone clobbering.

Runtime telemetry for the exploitation aftermath:

  • Sysmon Event ID 1 (Process Create) — anomalous children of network-facing services after a successful ROP/return-to-libc chain.
  • Sysmon Event ID 10 (Process Access) — VirtualAllocEx/WriteProcessMemory from convention-correct injected shellcode.
  • Sysmon Event ID 7 (Image Load) — unexpected DLL loads from a corrupted return address redirecting into LoadLibrary.
  • Microsoft-Windows-Threat-Intelligence ETW — kernel telemetry on NtAllocateVirtualMemory / NtWriteVirtualMemory.
  • Audit Process Creation (Event 4688) with command-line logging.
title: Suspicious Child Process from Network-Facing Service After Exploitation
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 1
    ParentImage|endswith:
      - '\w3wp.exe'
      - '\sqlservr.exe'
    Image|endswith:
      - '\cmd.exe'
      - '\powershell.exe'
  condition: selection
level: high

12. Tools for Calling-Convention Analysis

ToolDescriptionLink
IDA Pro / GhidraDecompiler ABI inference and stack-frame reconstructionghidra-sre.org
x64dbgLive register/stack inspection on Windowsx64dbg.com
GDB + pwndbgStack and register view on Linux (x/16gx $rsp)gnu.org
WinDbgInspect shadow space and frame layout (dd rsp)microsoft.com
Godbolt Compiler ExplorerCompare emitted asm across conventions/compilersgodbolt.org
ROPgadget / RopperEnumerate pop rdi ; ret-style register-loading gadgetsgithub.com
NASMHand-assemble convention test casesnasm.us
Radare2Cross-platform disassembly and ABI heuristicsrada.re

13. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Crash telemetry, Event 4688 child-process anomalies
Exploit Public-Facing ApplicationT1190WAF/IDS, anomalous service children (Event ID 1)
Process InjectionT1055Sysmon Event ID 10 (VirtualAllocEx/WriteProcessMemory)
Process Injection: DLL InjectionT1055.001Event ID 7 unexpected LoadLibraryA loads
Command and Scripting InterpreterT1059Event ID 1 cmd.exe/powershell.exe spawns
Reflective Code LoadingT1620ETW Threat-Intelligence memory-write telemetry

ATT&CK has no technique ID for “calling-convention abuse” — convention knowledge is prerequisite craft underlying these exploitation and injection techniques.


Summary

  • Calling conventions are the binary-level contract that makes stack layout deterministic — and therefore exploitable.
  • x86 splits into cdecl (caller cleanup, variadics, _foo), stdcall (callee RET N, _foo@N), and fastcall (ECX/EDX, MSVC-specific vs. Borland’s EAX/EDX/ECX).
  • The two 64-bit ABIs differ in argument registers (RCX,RDX,R8,R9 vs. RDI,RSI,RDX,RCX,R8,R9), shadow space (Windows only) vs. red zone (System V only), and callee-saved sets.
  • Convention dictates the buffer-to-return-address offset and the ROP register-loading gadgets required — pop rdi ; ret on Linux, shadow-space accounting on Windows.
  • Detect the exploitation artifacts, not the convention: Sysmon Event IDs 1/7/10, ETW Threat-Intelligence telemetry, and Event 4688, hardened with canaries, CFG, and CET shadow stacks.

Related Tutorials

References