x86 and x64 Assembly from Scratch

🎯 Objective

To gain a deep, foundational understanding of how x86 and x64 assembly work, from CPU registers and calling conventions to memory addressing and function calls. This is critical for exploit developers who need precise control over memory, registers, and the instruction pointer.


1. Why Learn Assembly for Exploitation?

Exploit developers operate close to the metal β€” at the point where programming languages are compiled into instructions the CPU can directly understand. Memory corruptions, ROP chains, shellcode, and low-level payloads require understanding register state, stack layout, and control flow.

In exploit development:

  • You overwrite EIP or RIP
  • You pivot the stack (ESP or RSP)
  • You inject shellcode and need to place arguments in registers or memory
  • You must understand how values are passed and returned at the assembly level

2. Architecture Overview: x86 vs x64

2.1 x86 (32-bit)

  • 4-byte registers (e.g., eax, ebx)
  • 4GB virtual address space
  • Arguments passed via stack
  • Used in legacy applications or 32-bit systems

2.2 x64 (64-bit)

  • 8-byte registers (rax, rbx)
  • 64-bit pointers, more addressable memory (up to 18 exabytes)
  • First 4 arguments passed in registers (Windows: rcx, rdx, r8, r9)
  • Return values in rax

2.3 Register Subdivisions

Example (x64):

Register:         rax (64-bit)
 β”œβ”€β”€ eax (32-bit)
 β”‚   β”œβ”€β”€ ax (16-bit)
 β”‚       β”œβ”€β”€ ah (8-bit high)
 β”‚       └── al (8-bit low)


3. Register Classifications

ClassRegisters (x86/x64)Description
General-purposeeax, ebx, ecx, edx / raxArithmetic, logic, data movement
Stack-relatedesp, ebp / rsp, rbpStack pointer/base pointer
Instructioneip / ripHolds address of next instruction
Flagseflags / rflagsStatus indicators (ZF, CF, SF)
Segmentcs, ds, es, ss, fs, gsRare in userland, used in kernel
SIMD/FPUxmm0–xmm15, st0–st7, mm0–mm7Vector ops, floating point, MMX

4. Instruction Types and Syntax (Intel Style)

4.1 Syntax Format

instruction destination, source

4.2 Common Instructions

CategoryExampleMeaning
Data Movemov eax, ebxCopy ebx to eax
Arithmeticadd eax, 4eax += 4
Logicaland eax, 0xFFClear all but lower byte
Shiftshr eax, 1Shift right (divide by 2)
Stackpush ebp, pop eaxPush/pull stack values
Controlcall, ret, jmp, je, jneControl flow

5. Addressing Modes and Operand Types

5.1 Addressing Types

ModeSyntaxExample
ImmediateValue constantmov eax, 1
RegisterCPU registermov eax, ebx
Direct MemoryAbsolute addrmov eax, [0x12345678]
Indirect MemoryRegister ptrmov eax, [ebx]
IndexedBase + indexmov eax, [ebp+4]

5.2 Operand Sizes

  • BYTE PTR [mem]: 8-bit
  • WORD PTR [mem]: 16-bit
  • DWORD PTR [mem]: 32-bit
  • QWORD PTR [mem]: 64-bit

6. Memory Layout and Stack Anatomy

Typical process memory layout:

0xFFFFFFFF  ← Stack Top (grows down)
     |
     | Stack (local vars, return addr)
     |
     | Heap (malloc/calloc/free - grows up)
     |
     | BSS (uninitialized globals)
     |
     | Data (initialized globals)
     |
     | Text (code, .text segment - executable)
0x00000000  ← Null page


7. Calling Conventions

7.1 cdecl (x86 Linux default)

  • Arguments pushed right-to-left
  • Return value in eax
  • Caller cleans stack

7.2 stdcall (Windows APIs)

  • Callee cleans stack

7.3 fastcall (Microsoft optimized)

  • Some args in registers (e.g., ecx, edx)

7.4 System V AMD64 ABI (Linux x64)

ArgumentRegister
arg1rdi
arg2rsi
arg3rdx
arg4rcx
arg5r8
arg6r9
  • Return: rax

7.5 Windows x64 Calling Convention

ArgumentRegister
arg1rcx
arg2rdx
arg3r8
arg4r9

8. Function Prologue and Epilogue

x86 Example

push ebp
mov ebp, esp
sub esp, XX         ; allocate space
...
mov esp, ebp
pop ebp
ret

Why It Matters

  • Stack frames are key for local variables
  • Exploits often overwrite saved EIP/RIP on stack

9. Flags Register (EFLAGS/RFLAGS)

FlagMeaning
ZF (Zero Flag)Set if result is 0
CF (Carry Flag)Set if carry occurred
SF (Sign Flag)Set if negative
OF (Overflow)Set if signed overflow
PF (Parity)Set if result has even parity

Used with:

  • cmp, test, je, jg, jl, jne, jz, jnz

10. Interrupts and Syscalls

Linux (x86):

mov eax, 1   ; syscall number: exit
mov ebx, 0   ; exit code
int 0x80     ; software interrupt

Linux (x64):

mov rax, 60  ; syscall: exit
mov rdi, 0   ; exit code
syscall


11. Loop and String Instructions

Looping

mov ecx, 10
loop_label:
; code
loop loop_label  ; decrements ecx, jumps if ecx != 0

String Instructions (with REP prefix)

  • movsb, movsw, movsd
  • cmpsb, stosb, scasb, lodsb
  • rep, repe, repne

12. Writing Inline Assembly in C

int a = 5, b = 3, result;
__asm__(
    "movl %1, %%eax;"
    "addl %2, %%eax;"
    "movl %%eax, %0;"
    : "=r"(result)
    : "r"(a), "r"(b)
    : "%eax"
);


13. Compiling and Running Pure Assembly

hello.asm (NASM + Linux)

section .data
    msg db "Hello!", 0xA
    len equ $ - msg

section .text
    global _start

_start:
    mov eax, 4
    mov ebx, 1
    mov ecx, msg
    mov edx, len
    int 0x80

    mov eax, 1
    xor ebx, ebx
    int 0x80

nasm -f elf hello.asm
ld -m elf_i386 hello.o -o hello
./hello


14. Reverse Engineering and Disassembly

Use objdump, Ghidra, or radare2:

objdump -d binary
gdb ./binary

Look for:

  • Function prologue: push ebp; mov ebp, esp
  • Function calls: call 0x08048400
  • Stack usage: mov eax, [ebp+0x8]

15. Tools and Emulators

ToolUseLink
NASMWrite x86 ASM
GDB + PwndbgDebugging
x64dbgWindows reversing
GodboltC to Assembly
GhidraDisassembler
Radare2RE suite
Online x86 EmulatorRun x86 code in browser

βœ… Summary

  • Assembly allows direct control of CPU and memory.
  • Key registers (eax, esp, eip) are critical for understanding control flow and payload placement.
  • Stack frames, calling conventions, and memory addressing are the basis of buffer overflows and ROP chains.
  • Tools like NASM, GDB, x64dbg, and Ghidra will help analyze and write exploits.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments