x86-64 Assembly Primer for Reverse Engineers

RE & Pwn
Time it takes to read this article 6 minutes.

Disclaimer: This article is for education and authorized security testing only. Reverse engineering may be restricted by license agreements (EULAs) or local law. Only analyze binaries you own, that you are explicitly authorized to assess, or that are provided in legitimate training environments such as CTFs and licensed labs.

Introduction / Overview

Almost every reverse engineering or exploitation task on Linux eventually drops you into a window full of mov, call, and lea. If those mnemonics look like noise, the rest of the tooling — GDB, Ghidra, IDA — stays opaque. This primer gives you the minimum viable mental model of x86-64 assembly: the registers, the System V calling convention, the stack, and how to read it all live with GDB disassembly.

The goal is not to make you write assembly by hand, but to let you read a function and immediately answer: where are the arguments, where is the return value, and what is on the stack?

How it works / Background

Registers

x86-64 has 16 general-purpose 64-bit registers. Each has narrower aliases: rax (64-bit), eax (32-bit), ax (16-bit), al (8-bit). Writing to a 32-bit alias (e.g. eax) zero-extends into the full 64-bit register — a frequent source of confusion.

Register Conventional role
rax Return value / accumulator
rbx Callee-saved general purpose
rcx, rdx Args 4 and 3, scratch
rsi, rdi Args 2 and 1, scratch
rbp Frame (base) pointer
rsp Stack pointer
r8r11 Scratch (r8/r9 are args 5/6)
r12r15 Callee-saved
rip Instruction pointer
rflags Status flags (ZF, CF, SF, OF)

Calling convention (System V AMD64 ABI)

On Linux/macOS, integer and pointer arguments to a function go in registers in this exact order:

rdi, rsi, rdx, rcx, r8, r9
Plaintext

Additional arguments are pushed onto the stack (right to left). The return value comes back in rax (and rdx for 128-bit returns). Floating-point arguments use xmm0xmm7. Crucially, the caller must preserve rax, rcx, rdx, rsi, rdi, r8–r11 if it needs them, while the callee must preserve rbx, rbp, r12–r15.

A useful trick: for variadic functions (like printf), al holds the number of vector registers used.

The stack

The stack grows downward (toward lower addresses). rsp always points at the top. A standard function prologue/epilogue looks like:

push   rbp          ; save caller's frame pointer
mov    rbp, rsp     ; establish new frame
sub    rsp, 0x20    ; reserve 32 bytes for locals
; ... function body ...
leave               ; mov rsp, rbp ; pop rbp
ret                 ; pop return address into rip
ASM

The call instruction pushes the return address onto the stack before jumping; ret pops it back into rip. This return address on the stack is exactly what a classic stack buffer overflow overwrites.

Prerequisites / Lab setup

You need a Linux box (a VM is fine), GCC, and GDB. The pwndbg extension makes GDB dramatically more readable for RE work.

sudo apt update
sudo apt install -y gcc gdb gdb-multiarch
git clone https://github.com/pwndbg/pwndbg
cd pwndbg && ./setup.sh
Bash

Create a tiny target so we can watch the calling convention in action:

// target.c
#include <stdio.h>

long add3(long a, long b, long c) {
    long sum = a + b + c;
    return sum;
}

int main(void) {
    long r = add3(0x10, 0x20, 0x30);
    printf("result = %ld\n", r);
    return 0;
}
C

Compile without optimization so the prologue and stack frame stay visible:

gcc -O0 -fno-stack-protector -no-pie -g target.c -o target
Bash

-no-pie gives fixed addresses (easier to read), and -fno-stack-protector removes the canary so the frame is uncluttered for learning. In real analysis you keep these protections; we disable them here only to expose the raw mechanics.

Walkthrough / PoC

Disassemble add3 statically first:

objdump -d -M intel target | grep -A 15 '<add3>:'
Bash

You'll see something close to:

<add3>:
  push   rbp
  mov    rbp,rsp
  mov    QWORD PTR [rbp-0x18],rdi   ; store arg a
  mov    QWORD PTR [rbp-0x20],rsi   ; store arg b
  mov    QWORD PTR [rbp-0x28],rdx   ; store arg c
  mov    rax,QWORD PTR [rbp-0x18]
  mov    rdx,QWORD PTR [rbp-0x20]
  add    rax,rdx
  add    rax,QWORD PTR [rbp-0x28]
  mov    QWORD PTR [rbp-0x8],rax    ; sum
  mov    rax,QWORD PTR [rbp-0x8]
  pop    rbp
  ret
ASM

Notice arguments arrive in rdi, rsi, rdx exactly as the ABI promises, and the result leaves in rax.

Now go dynamic with GDB. Set Intel syntax and break on add3:

gdb -q ./target
Bash
set disassembly-flavor intel
break add3
run
Plaintext

When the breakpoint hits, inspect the argument registers and the stack:

info registers rdi rsi rdx
x/4gx $rsp
disassemble
Plaintext

rdi should read 0x10, rsi 0x20, and rdx 0x30 — the three integer arguments. The top of the stack (x/4gx $rsp, four giant-words in hex) holds the saved return address pointing back into main.

Step through to watch rax get built and confirm the return value:

nexti 10
print/x $rax
Plaintext

To follow the call/return mechanics, set a breakpoint on the ret and examine rip vs. the value on the stack:

break *(add3+0x2e)
continue
x/gx $rsp
stepi
print/x $rip
Plaintext

After the ret, rip equals the value that was sitting at $rsp — that is the return-address pop in action. Understanding this single fact is the foundation of ROP and stack overflow exploitation.

Mermaid diagram

x86-64 Assembly Primer for Reverse Engineers diagram 1

The diagram shows one full call cycle: arguments loaded into registers, the return address pushed by call, frame setup, computation into rax, and the ret that restores rip to the caller.

Detection & Defense (Blue Team)

Reading assembly is offensive groundwork, but the same primitives are what defenders harden. Mitigations should be weighted at least as heavily as the offensive technique.

  • Stack canaries (-fstack-protector-strong): GCC/Clang insert a random guard value between locals and the saved return address. An overflow that reaches the return address corrupts the canary first, and __stack_chk_fail aborts the process. Build production binaries with -fstack-protector-strong or -fstack-protector-all.
  • NX / DEP: Mark the stack non-executable so injected shellcode on the stack cannot run. Verify with readelf -l ./target | grep GNU_STACK — the flags should be RW (no E).
  • ASLR + PIE: Compile with -pie (default on modern distros) and keep kernel.randomize_va_space=2 so register-leaked addresses are not stable across runs. Check with cat /proc/sys/kernel/randomize_va_space.
  • CFI / Intel CET: Control-flow Enforcement Technology (shadow stack + IBT) detects tampered return addresses in hardware. Build with -fcf-protection=full on supported toolchains.
  • Detection: Monitor for repeated crashes (SIGSEGV/SIGABRT) on a service — a hallmark of exploit brute-forcing against ASLR — via the kernel audit log or coredumpctl. Map this activity to MITRE ATT&CK T1203 (Exploitation for Client Execution) and T1055 (Process Injection). Defensive RE workflows often pair GDB with static tools like Ghidra to triage suspicious binaries before they reach production.
  • Compiler hardening audits: Run checksec --file=./target (from pwntools or checksec.sh) in CI to fail builds that ship without canaries, NX, RELRO, or PIE.
checksec --file=./target
readelf -l ./target | grep -A1 GNU_STACK
Bash

Conclusion

You now have the core reverse engineering vocabulary: the register set and their aliases, the System V argument order (rdi, rsi, rdx, rcx, r8, r9 in, rax out), how the stack frames and the call/ret pair move rip, and how to confirm all of it live with GDB disassembly. With this model, decompiler output and exploit write-ups stop being magic. Next, practice on real CTF binaries and step into printf to see variadic conventions, then move on to control-flow hijacking once the return-address mechanic is second nature.

References

Comments

Copied title and URL