Getting Started with Ghidra for Reverse Engineering and Malware Analysis

RE & Pwn
Time it takes to read this article 6 minutes.

Legal & ethical disclaimer. This article is for education and authorized security testing only. Reverse engineering may be restricted by software licenses (EULAs) and local law. Only analyze binaries you own, samples you are explicitly authorized to handle, or malware in an isolated lab. Never run untrusted samples outside a controlled, network-segmented sandbox.

Introduction / Overview

Ghidra is a free, open-source software reverse engineering (SRE) framework released by the U.S. National Security Agency (NSA) in 2019. It competes directly with commercial disassemblers like IDA Pro, but ships with a capable decompiler at zero cost — which is why it has become a default tool for malware analysts, CTF players, and vulnerability researchers.

This post is a hands-on tour of the pieces you actually use every day: the CodeBrowser, the decompiler, the function graph, data types, and scripting. By the end you will be able to import a binary, navigate it efficiently, recover structures, and automate repetitive tasks.

How it works / Background

Ghidra is organized around projects that hold one or more imported programs. When you import a binary, Ghidra runs an auto-analysis pass: it identifies the file format (via loaders for PE, ELF, Mach-O, raw, etc.), disassembles code, recovers functions, propagates types, and resolves cross-references.

Internally, native instructions are lifted into an intermediate representation called P-code. The decompiler operates on P-code rather than raw assembly, which is what makes Ghidra processor-agnostic — the same decompiler logic works across x86, ARM, MIPS, PowerPC, and dozens of other architectures defined in SLEIGH specification files. The decompiler output is not the original source; it is a C-like reconstruction driven by data-flow and type inference.

Key concepts:

  • CodeBrowser — the main analysis window (Listing view, decompiler, symbol tree, data type manager).
  • Decompiler — converts P-code into readable C-like pseudocode.
  • Function graph — a control-flow graph (CFG) view of basic blocks within a single function.
  • Data types — structures, enums, typedefs, and function signatures managed in the Data Type Manager; applying them dramatically improves decompiler output.
  • Scripting — automation via the Java or Python (Jython) GhidraScript API, or headless batch processing.

Prerequisites / Lab setup

Ghidra requires a Java Development Kit (JDK 21 for recent 11.x releases). Set up an isolated analysis VM (e.g., FlareVM or REMnux) before touching live malware.

# Install a JDK (Debian/Ubuntu example)
sudo apt update && sudo apt install -y openjdk-21-jdk

# Download and extract Ghidra (check the current release on GitHub)
wget https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.3.2_build/ghidra_11.3.2_PUBLIC_20250415.zip
unzip ghidra_11.3.2_PUBLIC_20250415.zip
cd ghidra_11.3.2_PUBLIC

# Launch the GUI
./ghidraRun
Bash

On Windows the launcher is ghidraRun.bat. Verify the JDK is on your PATH first with java -version.

For safety, compile a benign sample to practice on so you never need a live malware sample to learn the UI:

// crackme.c — compile with: gcc -O0 -no-pie crackme.c -o crackme
#include <stdio.h>
#include <string.h>

int check(const char *pw) {
    return strcmp(pw, "S3cr3t!") == 0;
}

int main(int argc, char **argv) {
    if (argc == 2 && check(argv[1]))
        puts("Access granted");
    else
        puts("Access denied");
    return 0;
}
C

Walkthrough / PoC

1. Create a project and import

Launch the GUI, then File → New Project → Non-Shared Project. Drag crackme into the project window (or File → Import File). Ghidra detects the ELF format and asks to analyze — click Yes and accept the default analyzers.

2. Navigate in the CodeBrowser

Double-click the imported program to open the CodeBrowser. The center Listing pane shows disassembly; the right-hand Decompiler pane shows pseudocode. Use the Symbol Tree (left) to jump to main. Press G to "Go To" any address or symbol, and L to rename the symbol under the cursor.

In the decompiler you will see something close to:

undefined8 main(int param_1, long param_2) {
    int iVar1;
    undefined8 uVar2;

    if ((param_1 == 2) && (iVar1 = check(*(char **)(param_2 + 8)), iVar1 != 0)) {
        puts("Access granted");
        uVar2 = 0;
    } else {
        puts("Access denied");
        uVar2 = 0;
    }
    return uVar2;
}
C

Press Ctrl+L in the decompiler to retype a variable, and L to rename it. The hardcoded comparison string inside check is immediately visible — that is the "secret" without ever running the binary.

3. Use the function graph

Inside main, open Window → Function Graph. This renders the function as a control-flow graph of basic blocks. Conditional branches fork into green (true) and red (false) edges, which makes the authentication branch obvious at a glance — far easier than reading linear assembly.

4. Apply data types

Suppose check actually parsed a custom header. Open the Data Type Manager (bottom-left), right-click your program → New → Structure, define fields, then in the decompiler press Ctrl+L on a pointer variable and assign your struct. The decompiler instantly rewrites offset arithmetic (*(int *)(buf + 0x10)) into named field access (hdr->magic), which is the single highest-leverage habit in serious RE work.

5. Headless and scripting

For triaging many samples, use the headless analyzer — no GUI required:

# Batch-import and auto-analyze a folder of samples
./support/analyzeHeadless /tmp/proj MalwareTriage \
    -import /samples/*.bin \
    -postScript MyEnumStrings.py \
    -deleteProject
Bash

Scripts live under ghidra_scripts/. A minimal Python (Jython) script to list functions and their entry points via the GhidraScript API:

# ListFuncs.py — run from the Script Manager
fm = currentProgram.getFunctionManager()
for f in fm.getFunctions(True):  # True = forward order
    print("0x%x  %s" % (f.getEntryPoint().getOffset(), f.getName()))
Python

Open the Script Manager (the green "play" toolbar icon), drop the file into a script directory, and run it. Output goes to the Console. Note that GUI scripting uses Jython 2.7; modern setups often add the Ghidrathon extension for CPython 3 support.

Mermaid diagram

Getting Started with Ghidra for Reverse Engineering and Malware Analysis diagram 1

The diagram shows the typical static-analysis loop: import, let auto-analysis lift the code, iteratively improve readability with types and renames, inspect logic in the function graph, and automate the repetitive parts with scripting before writing up findings.

Detection & Defense (Blue Team)

Reverse engineering is a dual-use skill. Defenders use the same workflow above to extract IOCs, decode configs, and write signatures. Equally, software vendors and defenders should make adversarial RE harder and detect malicious binaries the moment they execute.

For malware analysts (using Ghidra defensively):

  • Extract embedded strings, C2 domains, and crypto constants, then feed them into YARA rules. Map observed behavior to MITRE ATT&CK techniques such as T1055 (Process Injection), T1027 (Obfuscated Files or Information), and T1620 (Reflective Code Loading).
  • Use Ghidra's decompiler to recover the config-decryption routine, then reimplement it as a Python extractor for scalable triage.

For defenders hardening their environment:

  • Application allow-listing (Windows Defender Application Control / AppLocker) stops unsigned, unknown binaries from running regardless of how they were built.
  • EDR behavioral detection catches the runtime techniques RE reveals — suspicious VirtualAllocEx + WriteProcessMemory + CreateRemoteThread sequences map to T1055 and are far harder to evade than static signatures.
  • Network segmentation and egress filtering blunt the C2 channels you recover during analysis.

For developers raising the cost of RE:

  • Strip symbols (strip --strip-all) and avoid leaving debug info or descriptive function names in release builds.
  • Apply control-flow obfuscation or packing, but understand these only slow analysis — they never stop a determined analyst, and aggressive packing increases AV detection rates.
  • Treat anti-analysis as defense-in-depth, never as a substitute for proper security controls. See the malware unpacking workflow for how analysts defeat packers.

Detection is symmetric with offense here: every technique a developer uses to hinder RE produces artifacts (high-entropy sections, unusual imports, anti-debug API calls) that defenders fingerprint. For broader workflow context see x86-64 assembly for reversers and building a malware analysis lab.

Conclusion

Ghidra gives you a professional-grade decompiler, an intuitive function graph, a robust data type system, and full scripting — for free. The fastest path to proficiency is repetition on safe binaries: import a crackme, rename everything, apply real types, and automate one boring task with a script. Those habits transfer directly to live malware triage in an isolated lab. Stay within authorized scope, keep your analysis VM offline, and let the decompiler do the heavy lifting.

References

Comments

Copied title and URL