Disclaimer: This article is for education and authorized testing only. Analyze malware exclusively in isolated lab environments and only handle samples you are legally permitted to possess. Deploy detection rules only on systems you own or are contracted to defend.
Introduction / Overview
YARA is the de facto standard for describing and classifying malware by pattern. It was created by Victor Alvarez at VirusTotal and is used everywhere from individual incident responders' laptops to large-scale scanning pipelines. A YARA rule pairs human-readable signatures (text strings, byte sequences, regular expressions) with boolean logic in a condition block.
The hard part is not learning the syntax — it is writing rules that are specific enough to avoid false positives but generic enough to survive trivial repacking. This post walks through strings, conditions, hex patterns, the imphash of the PE module, and the yara command-line interface, with a working rule you can test against a benign sample.
How it works / Background
A YARA rule has three sections: meta (documentation), strings (the patterns), and condition (the boolean expression that must evaluate true for a match).
rule Example_Skeleton
{
meta:
author = "yunolay"
description = "Skeleton showing all three sections"
reference = "https://yara.readthedocs.io"
strings:
$text = "malicious_marker" ascii wide nocase
$hex = { 6A 40 68 00 30 00 00 6A 14 8D 91 }
$re = /https?:\/\/[a-z0-9\-]{6,30}\.top\//
condition:
uint16(0) == 0x5A4D and 2 of them
}PlaintextKey mechanics:
- String modifiers refine matching.
ascii(default) andwide(UTF-16LE) control encoding,nocasemakes it case-insensitive,fullwordrequires word boundaries, andxormatches single-byte XOR-encoded copies — invaluable against config strings that malware obfuscates. - Hex patterns match raw bytes. Wildcards (
??), nibble wildcards (9?), jumps ([4-6]for 4 to 6 arbitrary bytes), and alternatives (( 12 34 | 56 78 )) let you express stable code stubs while tolerating compiler variation. - The
conditionis where rules live or die.uint16(0) == 0x5A4Danchors on theMZmagic so the scanner skips non-PE files cheaply. Counting expressions like2 of them,all of ($s*), and offset constraints ($a at 0,$b in (0..1024)) keep rules tight. - The PE module (
import "pe") exposes structured fields, most usefullype.imphash(). The import hash is an MD5 over the names and order of a PE's imported functions; samples from the same toolchain or builder frequently share an imphash even when strings differ.
Prerequisites / Lab setup
Use an isolated VM with no network access to the host. Install YARA from packages or build from source:
# Debian/Ubuntu
sudo apt-get update && sudo apt-get install -y yara
# macOS
brew install yara
# Verify version and built-in modules
yara --versionBashFor Python-driven hunting, the yara-python bindings are the standard:
pip install yara-pythonBashTo compute an imphash for rule development, pefile is the canonical tool:
pip install pefile
python3 -c "import pefile; print(pefile.PE('sample.exe').get_imphash())"BashYou can grab a safe, non-malicious test binary (for example /bin/ls copied into the VM, or the EICAR test string) so you can validate rule logic without handling live malware.
Walkthrough / PoC (step by step, with commands)
Step 1 — Extract candidate signatures
Start by triaging the sample's printable strings. Pull both ASCII and 16-bit (wide) strings, since Windows malware stores many strings as UTF-16LE.
strings -a -n 8 sample.exe | sort -u | head -n 40
strings -a -e l -n 8 sample.exe | sort -u | head -n 40 # wide (little-endian)BashLook for stable artifacts: mutex names, hard-coded user-agents, PDB paths, C2 URL templates, and unique error messages. Avoid common library strings (Microsoft Corporation, KERNEL32.DLL) — they generate false positives.
Step 2 — Identify a stable code stub for a hex pattern
Disassemble a routine that is unlikely to change between builds — a custom decryption loop, for instance. In a disassembler such as Ghidra or radare2, copy the opcode bytes and wildcard the volatile operands (relative addresses, immediates):
# radare2: print 16 bytes as hex at a chosen offset
r2 -q -c 's 0x401200; px 16' sample.exeBashTranslate that into a hex string, masking the bytes that vary:
$decrypt_stub = { 8A 04 ?? 34 ?? 88 04 ?? [1-3] 75 ?? }PlaintextStep 3 — Pin the toolchain with imphash
Compute the import hash and use it as a high-confidence anchor that survives string obfuscation:
python3 -c "import pefile; print(pefile.PE('sample.exe').get_imphash())"
# e.g. 0c6803c4e922103c4dca5963aad36ddfBashStep 4 — Assemble the rule
import "pe"
rule APT_LoaderFamily_Generic
{
meta:
author = "yunolay"
date = "2025-10-22"
description = "Detects FooLoader via imphash + decrypt stub + C2 template"
hash = "9f86d0...<sha256>"
tlp = "WHITE"
strings:
$ua = "Mozilla/5.0 (FooClient 1.3)" ascii fullword
$mutex = "Global\\foo-mtx-" ascii wide
$decrypt_stub = { 8A 04 ?? 34 ?? 88 04 ?? [1-3] 75 ?? }
$c2 = /https?:\/\/[a-z0-9]{8,16}\.(top|xyz)\/gate\.php/ nocase
condition:
uint16(0) == 0x5A4D and
filesize < 2MB and
(
pe.imphash() == "0c6803c4e922103c4dca5963aad36ddf"
or ( $decrypt_stub and 1 of ($ua, $mutex, $c2) )
)
}PlaintextNote the layered logic: the imphash alone is enough for a confident hit, or the byte-level decryption stub combined with any one network/host indicator. The filesize and uint16(0) checks short-circuit non-matching files quickly.
Step 5 — Compile and scan with yara-cli
Always compile first to catch syntax errors, then scan recursively:
# Sanity-check syntax without scanning
yara -w -C rules/loader.yar /dev/null 2>&1 || echo "rule error"
# Recursive scan, print matching strings and offsets
yara -r -s rules/loader.yar /samples/
# Show which conditions matched, with tags and meta
yara -r -m -e rules/loader.yar /samples/
# Save compiled rules for fast reuse in a pipeline
yarac rules/loader.yar rules/loader.yarc
yara -C rules/loader.yarc /samples/BashUseful flags: -s prints matching strings, -m prints metadata, -e reports the matched condition, -d defines an external variable, and -t <tag> filters by tag. For performance triage on slow rules, yara --print-stats highlights expensive atoms.
Mermaid diagram

The diagram shows the iterative loop: extract candidates, anchor on the strongest signal (imphash or a hex stub plus a corroborating indicator), then retro-hunt and tighten until false positives drop to zero before deployment.
Detection & Defense (Blue Team)
YARA is itself a blue-team tool, so "defense" here means operationalizing rules safely and resisting evasion. This deserves as much rigor as rule writing.
- Test for false positives before deploying. Run every new rule against a large corpus of known-good files (a Windows
System32snapshot, common installers, your golden images). A rule that fires onvcruntime140.dllwill drown analysts in noise. Maintain a regression set and re-test on every change. See also building a malware analysis lab. - Layer rules into your stack. YARA integrates with many platforms: Velociraptor (
yara()VQL plugin) for endpoint sweeps, THOR / LOKI scanners for compromise assessment, Suricata'syarakeyword for file-extraction matching, and VirusTotal Hunting for retro-hunts. This maps to MITRE ATT&CK detection across T1059 (Command and Scripting Interpreter), T1027 (Obfuscated/Compressed Files), and T1620 (Reflective Code Loading). - Prefer structural over string indicators. Attackers flip strings cheaply but cannot trivially change import tables, section entropy, or code structure. Conditions built on
pe.imphash(),pe.number_of_sections, rich-header data, and resource hashes are far more durable than a single mutex string. - Use the
mathmodule to catch packing. High entropy is a strong packed/encrypted signal:import "math"thenmath.entropy(0, filesize) >= 7.4flags compressed payloads. Combine with PE anomalies (e.g., an entry point in the last section) to detect runtime packers. - Watch for in-memory-only malware. File-based YARA misses fileless threats. Scan process memory with
yara -p <pid>or via Velociraptor'sproc_yara, and pair with EDR telemetry, because reflective loaders never touch disk. - Version-control and peer-review rules. Treat detections as code: store them in Git, require review, and tag with TLP and ATT&CK IDs in
meta. Sources like the YARA-Forge and the Florian Roth signature-base repositories provide vetted baseline rules. - Beware adversary anti-YARA tactics. Sophisticated actors test payloads against public rule sets, pad files to exceed
filesizelimits, and randomize strings per build. Keep some rules private, and anchor on traits the attacker must keep functional (network protocol bytes, decryption constants).
Conclusion
Effective YARA rules come from disciplined signature selection, not exhaustive string dumps. Anchor on the most durable signal available — usually pe.imphash() or a wildcard hex stub of a custom routine — corroborate it with one or two specific indicators, gate everything behind cheap checks like uint16(0) and filesize, and validate against a clean corpus before shipping. Treat rules as living code: retro-hunt, measure false positives, and iterate. For deeper PE internals see PE file format internals.
References
- YARA documentation: https://yara.readthedocs.io/
- YARA GitHub (VirusTotal): https://github.com/VirusTotal/yara
pefilelibrary: https://github.com/erocarrera/pefile- MITRE ATT&CK — T1027 Obfuscated Files or Information: https://attack.mitre.org/techniques/T1027/
- MITRE ATT&CK — T1620 Reflective Code Loading: https://attack.mitre.org/techniques/T1620/
- Florian Roth, "How to Write Simple but Sound YARA Rules": https://www.nextron-systems.com/2015/02/16/write-simple-sound-yara-rules/
- YARA-Forge curated rule sets: https://yarahq.github.io/
- HackTricks — Malware analysis: https://book.hacktricks.xyz/



Comments