Emulation Plan Architecture: Structuring Phases, Objectives, Scenarios, and Success Criteria

By Debraj Basak·Jun 21, 2026·13 min readAdversary Emulation

There’s a sentence I’ve learned to dread in a post-engagement readout: “We ran a red team and got domain admin in four hours.” Great — against whose tradecraft? Using which delivery vector that an actual adversary uses? A generic red team answers “can we break in.” An adversary emulation plan (AEP) answers a harder, more useful question: “If FIN6 walked into this environment tomorrow, would we see them, and where exactly would we go blind?”

That difference is architectural. You can’t measure detection coverage against a threat you didn’t model on purpose. This tutorial walks through how to build an AEP the way MITRE’s Center for Threat-Informed Defense (CTID) structures theirs — intelligence summary, phases, operational flow, scenarios, objectives, success criteria — and how to wire it into CALDERA for repeatable, scorable execution. Everything here is for defenders, purple teamers, and authorized red teams instrumenting their own networks.

What Separates Emulation From a Generic Red Team

A penetration test hunts for any exploitable weakness. A generic red team chases an objective — usually domain admin or a crown-jewel dataset — by whatever path works. Adversary emulation is narrower and more disciplined: operators replicate the documented behavior of a specific named actor or a specific compound technique, sticking to that actor’s known TTPs while allowing latitude in implementation.

MITRE built AEPs as prototype documents assembled from public threat reporting and ATT&CK, so red teams can model adversary behavior and defenders can test their networks against it. The design goal is subtle but load-bearing: you’re building analytics for ATT&CK behaviors, not signatures for one IOC or one tool binary. Catch the behavior and you catch the next actor who reuses it.

CTID publishes two flavors:

Plan Type	Scope
Full emulation	End-to-end replication of one adversary (initial access → exfiltration), e.g. FIN6 or APT29
Micro emulation	A single compound behavior reused across many actors, decoupled from attribution

Pick full emulation when you’re validating against a named threat in your intel picture. Pick micro emulation for continuous, lightweight control validation between the big set-pieces.

The Intelligence Foundation

Every AEP starts with CTI research, and skipping it is the most common way to produce a plan that’s really just your favorite tooling wearing an actor’s name. The four operational steps of building an AEP are CTI research → technique selection → offensive development → emulation execution, and the first one feeds everything downstream.

Research a candidate using public sources: the actor’s ATT&CK Group page, CISA advisories, and vendor threat reports. Confirm the actor is both relevant to your sector and a significant or growing threat before committing. Then extract attributed techniques across a wide range of tactics and cite every one.

The Intelligence Summary is the first canonical AEP component. It carries the adversary overview — objectives, targets, tools — plus the attributed TTPs and the sources behind each claim.

# Intelligence Summary — <Actor Name> (<Group ID — verify on attack.mitre.org/groups>)

## Actor Overview
One paragraph of attribution and history. Cite every factual claim.

## Motivations
Financial | Espionage | Destruction | Hacktivism

## Target Sectors & Geography
...

## Attributed TTPs
| Tactic            | Technique                  | ATT&CK ID | Source |
|-------------------|----------------------------|-----------|--------|
| Initial Access    | Spearphishing Attachment   | T1566.001 | [1]    |
| Execution         | PowerShell                 | T1059.001 | [2]    |
| Credential Access | LSASS Memory               | T1003.001 | [3]    |

## Cited Sources
1. Vendor report ...
2. CISA advisory ...

A word of caution: ATT&CK Group IDs get reviewed and renumbered between versions. Confirm the actor’s current G-ID at attack.mitre.org/groups before you publish a plan — I’ve seen a stale G-ID survive three engagements because nobody re-checked.

Decomposing the Campaign Into Phases

Phases are the foundational structural unit of an AEP. A phase is an ordered cluster of ATT&CK tactics that represents a logical stage of the operation, described in terms of the adversary’s goal and how they achieve it.

The original MITRE APT3 plan used three phases:

Initial compromise / setup
Network propagation
Collection / exfiltration

CTID’s FIN6 plan compresses to two: Phase 1 focuses on initial access and placement, then exfiltrating data identified during that phase. APT29, the canonical CTID reference, is organized differently again — an infrastructure section that prepares the environment, plus two scenarios defined in the operations flow.

The lesson: there is no fixed phase count. To build yours, identify the tactics the actor uses, then the techniques and procedures for each, and group them where natural boundaries exist. Split a phase when the strategic objective changes (foothold → internal pivot). Merge phases when the actor’s tradecraft blurs the line. Write each phase as a self-contained block with explicit entry and exit conditions:

## Phase 2 — Internal Pivot & Credential Access

- **Strategic Objective:** Move from the initial foothold to a domain-joined
  host and obtain reusable credentials.
- **ATT&CK Tactics Covered:** TA0007 Discovery, TA0006 Credential Access,
  TA0008 Lateral Movement
- **Entry Condition:** Stable C2 on at least one workstation (Phase 1 exit).
- **Exit Condition:** Operator holds valid domain credentials usable on a
  second host.
- **Success Criteria (binary):**
  - [ ] LSASS credential material extracted from one host
  - [ ] Lateral authentication to a second host succeeds

The first AEP I shipped had beautiful phases and no exit conditions. Operators kept asking “are we done with Phase 2 yet?” over Signal because nobody had written down what “done” meant. Entry/exit conditions are not bureaucracy — they’re the only thing that makes a phase scorable.

Flowchart showing an emulation plan structured from an Intelligence Summary into three sequential phases — Initial Access, Pivot and Credential Access, Collection and Exfiltration — terminating at scored exit conditions. — Each phase is a self-contained tactic cluster with explicit entry and exit conditions that gate progression to the next stage.

Designing the Operational Flow

The Operational Flow is the second canonical component. It chains the individual techniques into a logical narrative — the major steps that commonly occur across the actor’s real operations — and becomes the authoritative sequencing reference every downstream scenario step must obey.

Think of it as a directed graph: nodes are techniques, edges are “this enables that.” Spearphishing Attachment (T1566.001) enables PowerShell execution (T1059.001), which enables Discovery (TA0007), which enables LSASS dumping (T1003.001), which enables SMB lateral movement (T1021.002). Render it as a diagram in the human-readable plan and as an ATT&CK Navigator layer for coverage visualization. The flow is where you sanity-check that your phases actually connect — if a technique has no enabling predecessor, your intelligence has a gap or your sequencing is wrong.

Directed graph showing the adversary technique chain from Spearphishing Attachment through PowerShell execution and Discovery to LSASS credential dumping, SMB lateral movement, and final exfiltration over C2. — The operational flow is a directed technique graph where each node enables its successor — gaps in this chain reveal intelligence shortfalls before execution begins.

Writing Scenarios: From Flow to Procedures

A scenario translates one stretch of the operational flow into TTP-by-TTP, command-by-command operator instructions. This is the third canonical component — the Emulation Plan itself — the walkthrough that implements the actor’s tradecraft.

Each step should carry: tactic, technique ID, procedure description, the concrete command, and expected output. Scenarios can run end-to-end or as isolated behaviors, and teams routinely customize them to fit their environment or fresh intelligence.

Scenario 1 vs. Scenario 2

APT29’s two-scenario structure is the pattern worth stealing. Scenario 1 is the targeted, stealthy path — low-volume, tradecraft-faithful, the version that tests whether your quiet detections fire. Scenario 2 is broader and noisier — more techniques, faster, the version that tests breadth of telemetry. Run Scenario 1 to find subtle blind spots; run Scenario 2 to confirm you catch the obvious. Both draw from the same operational flow; they differ in volume and stealth, not in attribution.

Defining Objectives and Success Criteria

Phases are described by the adversary’s intended goal. Objectives make that goal measurable and binary — no “improve lateral movement detection,” which is unscoreable. Write “Operator achieves SYSTEM on at least one domain-joined host” or “Operator exfiltrates a 10 MB staged archive over the C2 channel.”

Success criteria split into three categories, and a mature plan scores all three independently:

Category	Question it answers
Offensive	Did the operator achieve the objective?
Defensive	Did an alert fire and did an analyst respond?
Coverage	Was the technique executed and logged, even if not alerted?

That third row is the one teams forget, and it’s the most valuable. A technique can succeed offensively, generate zero alerts, and still leave telemetry — which means the detection content is the gap, not the sensor. Map each criterion to an ATT&CK Navigator coverage layer so the output is a heat map, not a paragraph.

Hierarchy diagram splitting each emulation step's outcome into three independent binary scoring tracks: offensive objective achieved, defensive alert and analyst response triggered, and raw telemetry coverage confirmed in the SIEM. — Scoring three independent tracks per step reveals whether a gap is a detection-content failure, a response-process failure, or a sensor-coverage failure — each requiring a different fix.

Rules of Engagement, Scope, and Authorization

None of the above runs until the planning and scoping phase is in writing. Red teamers work with stakeholders to clarify objectives, set boundaries, define success criteria, and establish legal and compliance parameters. The authorization document must contain, at minimum:

Target system inventory and explicitly out-of-scope assets (map these back to specific plan phases — if Phase 3 touches a system that’s off-limits, you know before execution, not during)
Permitted vs. prohibited techniques (e.g., LSASS dumping allowed in audit only; no destructive TA0040 actions)
Emergency stop / rollback procedure and who can invoke it
Communications plan — out-of-band channel, escalation contacts
Legal authorization signed by someone empowered to grant it

Tie every scope limit to a phase. “No production database access” is an exit-condition modifier for whatever phase reaches collection.

Machine-Readable Plans: YAML and CALDERA

The human-readable plan is paired with a machine-readable plan in YAML, designed so each step couples directly to its human-readable equivalent. CTID’s schema started from Red Canary’s Atomic Red Team format and extended it to capture threat intelligence. That YAML is what CALDERA ingests for automated execution — CALDERA lets you define adversary profiles by ATT&CK technique ID, deploy agents, and run operations that follow the playbook or autonomously chain techniques based on what they discover.

A single annotated step:

- id: 1.A.1
  name: Spearphishing Attachment
  description: >
    Deliver a weaponized document to a target inbox to gain initial code
    execution, mirroring the actor's documented delivery TTP.
  tactic: initial-access            # TA0001
  technique:
    attack_id: T1566.001
    name: "Phishing: Spearphishing Attachment"
  platforms: [windows]
  executor:
    name: manual                    # operator action; no payload shown
    command: >
      # Send crafted document from staging mailbox per ROE
  cleanup: >
    Remove delivered artifact from target host and mailbox.

Because each YAML step mirrors a human-readable step, you can generate an operator checklist mechanically — useful for keeping the two representations in sync:

import yaml

def load_plan(path):
    with open(path) as f:
        return yaml.safe_load(f)

def build_checklist(steps):
    for step in steps:
        tech = step["technique"]
        print(f"[{step['id']}] {step['name']}")
        print(f"  ATT&CK : {tech['attack_id']} ({tech['name']})")
        print(f"  Tactic : {step['tactic']}")
        print(f"  Action : {step['executor']['command'].strip()}")
        print(f"  Expect : {step.get('expected_artifact', 'see human-readable plan')}")
        print("-" * 60)

plan = load_plan("emulation_plan.yaml")
build_checklist(plan["steps"])

Measuring Outcomes and the Purple Team Loop

After execution, score every step. Keep the rubric flat and binary so two analysts produce the same numbers:

Step Name	ATT&CK ID	Executed (Y/N)	Alert Generated (Y/N)	Analyst Notified (Y/N)	Blocked (Y/N)	Notes
Spearphishing delivery	T1566.001	Y	N	N	N	No mail-sandbox detonation
PowerShell execution	T1059.001	Y	Y	Y	N	ScriptBlock 4104 fired
LSASS dump	T1003.001	Y	Y	N	Y	EDR blocked; no analyst page

Roll those rows into an ATT&CK Navigator layer to visualize coverage — Phase 1 in one color band, Phase 2 in another:

{
  "name": "AEP Coverage — Phase 1 vs Phase 2",
  "domain": "enterprise-attack",
  "techniques": [
    { "techniqueID": "T1566.001", "score": 1, "color": "#66b1ff",
      "comment": "Phase 1 — Initial Access (executed, not detected)" },
    { "techniqueID": "T1003.001", "score": 2, "color": "#ff9f40",
      "comment": "Phase 2 — Credential Access (blocked)" }
  ]
}

The debrief is where value compounds. Map identified TTPs back to specific CALDERA abilities, reconstruct the full chain, and feed it into continuous purple teaming — re-running the failed steps after each detection fix until the heat map goes green.

Circular flowchart depicting the purple team feedback loop: execute an AEP step in CALDERA, score the outcome, update the ATT&CK Navigator heat map, identify the blind spot, fix the detection or sensor, then re-run — repeating until coverage is confirmed. — The purple team loop converts each missed detection into a closed engineering ticket, making the ATT&CK Navigator heat map a live coverage dashboard rather than a post-engagement artifact.

Defensive Strategies & Detection

This is the part defenders own: instrument the environment so each phase produces measurable telemetry before you run it. If the sensor’s blind, your “missed” score is meaningless.

Sysmon coverage per phase

Phase Category	Key Sysmon Event IDs
Initial Access / Execution	`EID 1` (Process Create), `EID 3` (Network Connection), `EID 7` (Image Load), `EID 11` (File Create)
Persistence	`EID 13` (Registry Set), `EID 12` (Registry Create/Delete), `EID 1` (new service/task process)
Privilege Escalation	`EID 1` (parent/child token anomalies), `EID 10` (Process Access — LSASS reads)
Lateral Movement	`EID 3` (outbound SMB/WinRM), `EID 1` (PsExec/WMIC children), `EID 25` (Process Tampering)
Collection / Exfiltration	`EID 11` (staging writes), `EID 3` (outbound to C2)

ETW providers and audit policy

Enable Microsoft-Windows-Security-Auditing (4624/4625/4648 logon, 4672 privilege use, 4688 with command line), Microsoft-Windows-PowerShell/Operational (4104 ScriptBlock, 4103 module), and Microsoft-Windows-WMI-Activity/Operational (5857–5861). Turn on Audit Process Creation, Audit Logon, Audit Object Access, Audit Privilege Use, and Audit Detailed File Share.

A Sigma sketch to validate a credential-access phase objective:

title: LSASS Memory Access — AEP Credential-Access Phase Validation
logsource:
  product: windows
  service: sysmon
detection:
  selection:
    EventID: 10
    TargetImage|endswith: '\lsass.exe'
    GrantedAccess: '0x1410'
  condition: selection
level: high

Before you score any “Blocked” criterion, confirm EDR is in blocking mode, not detect-only. Run ASR rules in audit mode during emulation so the plan captures what would have been blocked, and baseline your Sysmon config on a known-good template (SwiftOnSecurity or Olaf Hartong) so missing event IDs don’t masquerade as missing attacks.

Tools for Emulation Plan Architecture

Tool	Description	Link
MITRE CALDERA	Ingests YAML plans; runs agent-based, ATT&CK-mapped operations	`caldera.readthedocs.io`
ATT&CK Navigator	Coverage heat maps and per-phase layer overlays	`mitre-attack.github.io`
Atomic Red Team	Compatible YAML test format; per-technique atomics	`atomicredteam.io`
CTID Emulation Library	Reference full/micro plans (APT29, FIN6, APT3)	`ctid.mitre.org`
Sysmon	Process/network/registry telemetry for outcome scoring	`sysinternals.com`
Sigma	Portable detection rules for validating phase outcomes	`sigmahq.io`

MITRE ATT&CK Mapping

Technique	MITRE ID	Detection
Initial Access (tactic)	`TA0001`	Mail sandboxing; `EID 11` artifact writes
Execution (tactic)	`TA0002`	`EID 1` + 4688 command line; 4104 ScriptBlock
Credential Access (tactic)	`TA0006`	`EID 10` LSASS reads; 4672 privilege use
Lateral Movement (tactic)	`TA0008`	`EID 3` outbound SMB; PsExec/WMIC children
Exfiltration (tactic)	`TA0010`	`EID 3` to C2; egress volume baselines
Spearphishing Attachment	`T1566.001`	Mail detonation; `EID 1` office-spawn chains
PowerShell	`T1059.001`	4104/4103; Constrained Language Mode
LSASS Memory Dumping	`T1003.001`	`EID 10` `GrantedAccess 0x1410`; EDR block
SMB/Admin Shares	`T1021.002`	`EID 3` + Audit Detailed File Share
Exfiltration Over C2	`T1041`	`Initiated` outbound to known C2
Web Protocols (C2)	`T1071.001`	Proxy/JA3 anomalies; infra-phase setup

Reference group profiles for examples: APT3 (G0022, original three-phase plan), APT28 (G0007), APT29 (G0016, canonical CTID two-scenario reference), and FIN6 (verify current G-ID on attack.mitre.org/groups — it has been renumbered across ATT&CK versions).

Summary

An emulation plan is architecture, not improvisation — intelligence summary, operational flow, and the TTP-by-TTP emulation plan are the three load-bearing components, and every step traces back to cited intelligence and an ATT&CK ID.
Phases are ordered tactic clusters with explicit entry and exit conditions — APT3 used three, FIN6 uses two; let the actor’s tradecraft and your scoped objectives decide the count, never a template.
Scenarios turn flow into commands, and objectives turn goals into binary pass/fail — score offensive success, defensive response, and logging coverage as three separate measurements.
The YAML plan couples human-readable steps to CALDERA execution, making runs repeatable and the purple-team feedback loop continuous.
Detection coverage is the real deliverable — instrument Sysmon EID 1/3/10/11, ScriptBlock 4104, and per-phase audit policy before execution, then render results as an ATT&CK Navigator heat map to expose blind spots.

References

Post Views: 1

Get new drops in your inbox

Windows internals, exploit dev, and red-team write-ups — no spam, unsubscribe anytime.