Writing Effective YARA Detection Rules

Malware & C2
Time it takes to read this article 7 minutes.

Disclaimer: This article is for education and authorized testing only. Analyze malware exclusively in isolated lab environments and only handle samples you are legally permitted to possess. Deploy detection rules only on systems you own or are contracted to defend.

Introduction / Overview

YARA is the de facto standard for describing and classifying malware by pattern. It was created by Victor Alvarez at VirusTotal and is used everywhere from individual incident responders' laptops to large-scale scanning pipelines. A YARA rule pairs human-readable signatures (text strings, byte sequences, regular expressions) with boolean logic in a condition block.

The hard part is not learning the syntax — it is writing rules that are specific enough to avoid false positives but generic enough to survive trivial repacking. This post walks through strings, conditions, hex patterns, the imphash of the PE module, and the yara command-line interface, with a working rule you can test against a benign sample.

How it works / Background

A YARA rule has three sections: meta (documentation), strings (the patterns), and condition (the boolean expression that must evaluate true for a match).

rule Example_Skeleton
{
    meta:
        author      = "yunolay"
        description = "Skeleton showing all three sections"
        reference   = "https://yara.readthedocs.io"
    strings:
        $text = "malicious_marker" ascii wide nocase
        $hex  = { 6A 40 68 00 30 00 00 6A 14 8D 91 }
        $re   = /https?:\/\/[a-z0-9\-]{6,30}\.top\//
    condition:
        uint16(0) == 0x5A4D and 2 of them
}
Plaintext

Key mechanics:

  • String modifiers refine matching. ascii (default) and wide (UTF-16LE) control encoding, nocase makes it case-insensitive, fullword requires word boundaries, and xor matches single-byte XOR-encoded copies — invaluable against config strings that malware obfuscates.
  • Hex patterns match raw bytes. Wildcards (??), nibble wildcards (9?), jumps ([4-6] for 4 to 6 arbitrary bytes), and alternatives (( 12 34 | 56 78 )) let you express stable code stubs while tolerating compiler variation.
  • The condition is where rules live or die. uint16(0) == 0x5A4D anchors on the MZ magic so the scanner skips non-PE files cheaply. Counting expressions like 2 of them, all of ($s*), and offset constraints ($a at 0, $b in (0..1024)) keep rules tight.
  • The PE module (import "pe") exposes structured fields, most usefully pe.imphash(). The import hash is an MD5 over the names and order of a PE's imported functions; samples from the same toolchain or builder frequently share an imphash even when strings differ.

Prerequisites / Lab setup

Use an isolated VM with no network access to the host. Install YARA from packages or build from source:

# Debian/Ubuntu
sudo apt-get update && sudo apt-get install -y yara

# macOS
brew install yara

# Verify version and built-in modules
yara --version
Bash

For Python-driven hunting, the yara-python bindings are the standard:

pip install yara-python
Bash

To compute an imphash for rule development, pefile is the canonical tool:

pip install pefile
python3 -c "import pefile; print(pefile.PE('sample.exe').get_imphash())"
Bash

You can grab a safe, non-malicious test binary (for example /bin/ls copied into the VM, or the EICAR test string) so you can validate rule logic without handling live malware.

Walkthrough / PoC (step by step, with commands)

Step 1 — Extract candidate signatures

Start by triaging the sample's printable strings. Pull both ASCII and 16-bit (wide) strings, since Windows malware stores many strings as UTF-16LE.

strings -a -n 8 sample.exe | sort -u | head -n 40
strings -a -e l -n 8 sample.exe | sort -u | head -n 40   # wide (little-endian)
Bash

Look for stable artifacts: mutex names, hard-coded user-agents, PDB paths, C2 URL templates, and unique error messages. Avoid common library strings (Microsoft Corporation, KERNEL32.DLL) — they generate false positives.

Step 2 — Identify a stable code stub for a hex pattern

Disassemble a routine that is unlikely to change between builds — a custom decryption loop, for instance. In a disassembler such as Ghidra or radare2, copy the opcode bytes and wildcard the volatile operands (relative addresses, immediates):

# radare2: print 16 bytes as hex at a chosen offset
r2 -q -c 's 0x401200; px 16' sample.exe
Bash

Translate that into a hex string, masking the bytes that vary:

$decrypt_stub = { 8A 04 ?? 34 ?? 88 04 ?? [1-3] 75 ?? }
Plaintext

Step 3 — Pin the toolchain with imphash

Compute the import hash and use it as a high-confidence anchor that survives string obfuscation:

python3 -c "import pefile; print(pefile.PE('sample.exe').get_imphash())"
# e.g. 0c6803c4e922103c4dca5963aad36ddf
Bash

Step 4 — Assemble the rule

import "pe"

rule APT_LoaderFamily_Generic
{
    meta:
        author       = "yunolay"
        date         = "2025-10-22"
        description  = "Detects FooLoader via imphash + decrypt stub + C2 template"
        hash         = "9f86d0...<sha256>"
        tlp          = "WHITE"
    strings:
        $ua    = "Mozilla/5.0 (FooClient 1.3)" ascii fullword
        $mutex = "Global\\foo-mtx-" ascii wide
        $decrypt_stub = { 8A 04 ?? 34 ?? 88 04 ?? [1-3] 75 ?? }
        $c2    = /https?:\/\/[a-z0-9]{8,16}\.(top|xyz)\/gate\.php/ nocase
    condition:
        uint16(0) == 0x5A4D and
        filesize < 2MB and
        (
            pe.imphash() == "0c6803c4e922103c4dca5963aad36ddf"
            or ( $decrypt_stub and 1 of ($ua, $mutex, $c2) )
        )
}
Plaintext

Note the layered logic: the imphash alone is enough for a confident hit, or the byte-level decryption stub combined with any one network/host indicator. The filesize and uint16(0) checks short-circuit non-matching files quickly.

Step 5 — Compile and scan with yara-cli

Always compile first to catch syntax errors, then scan recursively:

# Sanity-check syntax without scanning
yara -w -C rules/loader.yar /dev/null 2>&1 || echo "rule error"

# Recursive scan, print matching strings and offsets
yara -r -s rules/loader.yar /samples/

# Show which conditions matched, with tags and meta
yara -r -m -e rules/loader.yar /samples/

# Save compiled rules for fast reuse in a pipeline
yarac rules/loader.yar rules/loader.yarc
yara -C rules/loader.yarc /samples/
Bash

Useful flags: -s prints matching strings, -m prints metadata, -e reports the matched condition, -d defines an external variable, and -t <tag> filters by tag. For performance triage on slow rules, yara --print-stats highlights expensive atoms.

Mermaid diagram

Writing Effective YARA Detection Rules diagram 1

The diagram shows the iterative loop: extract candidates, anchor on the strongest signal (imphash or a hex stub plus a corroborating indicator), then retro-hunt and tighten until false positives drop to zero before deployment.

Detection & Defense (Blue Team)

YARA is itself a blue-team tool, so "defense" here means operationalizing rules safely and resisting evasion. This deserves as much rigor as rule writing.

  • Test for false positives before deploying. Run every new rule against a large corpus of known-good files (a Windows System32 snapshot, common installers, your golden images). A rule that fires on vcruntime140.dll will drown analysts in noise. Maintain a regression set and re-test on every change. See also building a malware analysis lab.
  • Layer rules into your stack. YARA integrates with many platforms: Velociraptor (yara() VQL plugin) for endpoint sweeps, THOR / LOKI scanners for compromise assessment, Suricata's yara keyword for file-extraction matching, and VirusTotal Hunting for retro-hunts. This maps to MITRE ATT&CK detection across T1059 (Command and Scripting Interpreter), T1027 (Obfuscated/Compressed Files), and T1620 (Reflective Code Loading).
  • Prefer structural over string indicators. Attackers flip strings cheaply but cannot trivially change import tables, section entropy, or code structure. Conditions built on pe.imphash(), pe.number_of_sections, rich-header data, and resource hashes are far more durable than a single mutex string.
  • Use the math module to catch packing. High entropy is a strong packed/encrypted signal: import "math" then math.entropy(0, filesize) >= 7.4 flags compressed payloads. Combine with PE anomalies (e.g., an entry point in the last section) to detect runtime packers.
  • Watch for in-memory-only malware. File-based YARA misses fileless threats. Scan process memory with yara -p <pid> or via Velociraptor's proc_yara, and pair with EDR telemetry, because reflective loaders never touch disk.
  • Version-control and peer-review rules. Treat detections as code: store them in Git, require review, and tag with TLP and ATT&CK IDs in meta. Sources like the YARA-Forge and the Florian Roth signature-base repositories provide vetted baseline rules.
  • Beware adversary anti-YARA tactics. Sophisticated actors test payloads against public rule sets, pad files to exceed filesize limits, and randomize strings per build. Keep some rules private, and anchor on traits the attacker must keep functional (network protocol bytes, decryption constants).

Conclusion

Effective YARA rules come from disciplined signature selection, not exhaustive string dumps. Anchor on the most durable signal available — usually pe.imphash() or a wildcard hex stub of a custom routine — corroborate it with one or two specific indicators, gate everything behind cheap checks like uint16(0) and filesize, and validate against a clean corpus before shipping. Treat rules as living code: retro-hunt, measure false positives, and iterate. For deeper PE internals see PE file format internals.

References

Comments

Copied title and URL