Master YARA rule development on REMnux — from basic string matching to advanced modules (pe, elf, math, cuckoo). Analyze real malware samples, extract distinguishing patterns, write high-confidence detection rules, and integrate them into a threat hunting workflow.
REMnux has YARA pre-installed. Verify and set up the lab workspace:
yara --version pip3 install yara-python pefile mkdir ~/yara-lab && cd ~/yara-lab mkdir samples rules test_clean test_malware reports # Install yarGen (automated YARA rule generator) git clone https://github.com/Neo23x0/yarGen.git ~/yarGen cd ~/yarGen && pip3 install -r requirements.txt python3 yarGen.py --update # Downloads goodware DB # Download YARA rule sources mkdir ~/yara-lab/rule_sources # Florian Roth's signature-base rules git clone https://github.com/Neo23x0/signature-base.git ~/yara-lab/rule_sources/signature-base echo "YARA $(yara --version) ready"
Learn the anatomy of a YARA rule through progressively complex examples:
cat > ~/yara-lab/rules/01_basic.yar << 'EOF'
// YARA Rule Structure Reference
// rule [rule_name] : [tags] {
// meta: // documentation
// strings: // patterns to search for
// condition: // logic combining strings
// }
rule Example_Simple_String {
meta:
description = "Detect simple string match"
author = "CyberSec Pro Academy"
severity = "LOW"
strings:
$s1 = "malware_string_here" // case-sensitive text
$s2 = "MaLwArE" nocase // case-insensitive
$s3 = { 4D 5A 90 00 } // hex pattern (MZ header)
$s4 = /[Mm]alware\d{3}/ // regex
$b1 = "cmd.exe" wide // UTF-16LE (Windows)
$b2 = "powershell" wide nocase
condition:
$s3 at 0 and // MZ header at byte 0
(any of ($s*) or $b1 or $b2)
}
rule Example_PE_File {
meta:
description = "Match PE files (executables)"
condition:
uint16(0) == 0x5A4D and // MZ magic bytes
uint32(uint32(0x3C)) == 0x00004550 // PE signature
}
rule Example_Packed_PE {
meta:
description = "Detect possibly packed PE with high entropy"
strings:
$mz = { 4D 5A } // MZ header
condition:
$mz at 0 and
pe.number_of_sections < 4 and
math.entropy(0, filesize) > 7.0
}
EOF
# Test on a sample binary
yara ~/yara-lab/rules/01_basic.yar /bin/ls
echo "If /bin/ls matches PE rule, adjust rule — ELF binaries have different magic"
FLOSS extracts both static strings and deobfuscated strings from malware:
# Download a safe malware sample from MalwareBazaar (EICAR test file first)
curl -O https://secure.eicar.org/eicar.com.txt
mv eicar.com.txt ~/yara-lab/samples/eicar_test.txt
# For real malware samples, download from MalwareBazaar
python3 << 'EOF'
import requests, zipfile, io, os
# Query MalwareBazaar for a sample (e.g., Mirai botnet, well-documented)
resp = requests.post('https://mb-api.abuse.ch/api/v1/',
data={'query': 'get_taginfo', 'tag': 'Mirai', 'limit': 1})
data = resp.json()
if data.get('query_status') == 'ok' and data.get('data'):
sample = data['data'][0]
sha256 = sample.get('sha256_hash', '')
print(f"Sample: {sample.get('file_name', '')} | SHA256: {sha256[:16]}...")
# Download sample (password: infected)
dl = requests.post('https://mb-api.abuse.ch/api/v1/',
data={'query': 'get_file', 'sha256_hash': sha256})
if dl.status_code == 200:
with open(f"samples/{sha256[:8]}.zip", 'wb') as f:
f.write(dl.content)
print("Sample downloaded — unzip with password 'infected'")
else:
print("Using EICAR test file for rule development")
EOF
# Extract strings from sample
floss ~/yara-lab/samples/eicar_test.txt 2>/dev/null || \
strings ~/yara-lab/samples/eicar_test.txt
# For real samples (after unzipping):
# floss --no-filter ~/yara-lab/samples/malware.exe > ~/yara-lab/samples/strings.txt
cat > ~/yara-lab/rules/02_pe_module.yar << 'EOF'
import "pe"
rule Suspicious_PE_Characteristics {
meta:
description = "PE with suspicious characteristics"
author = "CyberSec Pro Academy - L18"
condition:
pe.is_pe and
// Signed but certificate is expired or self-signed
// (pe.number_of_signatures == 0 means unsigned)
(pe.number_of_signatures == 0 or
for any i in (0..pe.number_of_signatures-1):
(pe.signatures[i].not_after < pe.timestamp)) and
// Suspicious section names or characteristics
(for any i in (0..pe.number_of_sections-1):
(pe.sections[i].name matches /\.text|\.data|code/ and
pe.sections[i].characteristics & pe.SECTION_MEM_EXECUTE != 0 and
pe.sections[i].characteristics & pe.SECTION_MEM_WRITE != 0))
}
rule Ransomware_API_Imports {
meta:
description = "PE importing APIs commonly used by ransomware"
author = "CyberSec Pro Academy - L18"
mitre = "T1486 - Data Encrypted for Impact"
severity = "HIGH"
condition:
pe.is_pe and
// Cryptography APIs
pe.imports("advapi32.dll", "CryptEncrypt") and
pe.imports("advapi32.dll", "CryptGenKey") and
// File enumeration (to find files to encrypt)
(pe.imports("kernel32.dll", "FindFirstFileW") or
pe.imports("kernel32.dll", "FindFirstFileA")) and
// Shadow copy deletion (common ransomware behavior)
for any i in (0..pe.resources.length-1):
(pe.resources[i].type_string matches /string/i)
}
rule Cobalt_Strike_Beacon {
meta:
description = "Cobalt Strike Beacon characteristics"
author = "CyberSec Pro Academy - L18"
reference = "https://attack.mitre.org/software/S0154/"
strings:
$config1 = { 00 01 00 01 00 02 [4] 00 02 } // CS config structure
$watermark = { 69 68 69 68 } // Common CS watermark
$pipe_named = "\\\\.\\pipe\\" wide
$post_ex = "powershell -nop -exec bypass -EncodedCommand" wide nocase
condition:
uint16(0) == 0x5A4D and // MZ header
(2 of ($config1, $watermark, $pipe_named, $post_ex))
}
EOF
# Test PE module rules (requires actual PE files)
yara -r ~/yara-lab/rules/02_pe_module.yar ~/yara-lab/samples/ 2>/dev/null || \
echo "Add PE files to samples/ to test. PE module requires YARA compiled with PE support."
cat > ~/yara-lab/rules/03_entropy.yar << 'EOF'
import "math"
rule High_Entropy_File_Likely_Packed {
meta:
description = "File with high entropy suggesting packing or encryption"
author = "CyberSec Pro Academy - L18"
note = "High entropy alone is not malicious — many legitimate archives/certs qualify"
condition:
math.entropy(0, filesize) > 7.2 and
filesize > 10KB and filesize < 10MB
}
rule High_Entropy_PE_Section {
meta:
description = "PE with a high-entropy section (packed code)"
mitre = "T1027.002 - Software Packing"
condition:
uint16(0) == 0x5A4D and // PE
for any i in (0..pe.number_of_sections-1):
(math.entropy(pe.sections[i].raw_data_offset,
pe.sections[i].raw_data_size) > 7.5)
}
rule XOR_Encoded_Payload {
meta:
description = "Detect XOR encoding patterns in binary"
strings:
// XOR loop patterns in assembly
$xor_loop_x86 = { 8A [1-4] 30 [1-4] 88 [1-4] }
$xor_loop_x64 = { 0F B6 [2-5] 30 [2-5] 88 [2-5] }
condition:
uint16(0) == 0x5A4D and
any of ($xor_loop*)
}
EOF
# Calculate entropy of known files for reference
python3 << 'EOF'
import math, collections
def file_entropy(filepath):
with open(filepath, 'rb') as f:
data = f.read()
if not data: return 0
counter = collections.Counter(data)
total = len(data)
entropy = -sum((count/total) * math.log2(count/total)
for count in counter.values())
return entropy
import os, glob
for path in ['/bin/ls', '/bin/sh', '/dev/urandom'] + \
glob.glob('samples/*.txt')[:3]:
if os.path.exists(path):
try:
e = file_entropy(path)
print(f"{os.path.basename(path):30} entropy={e:.4f} {'[HIGH]' if e>7 else ''}")
except: pass
EOF
cat > ~/yara-lab/rules/04_ransomware.yar << 'EOF'
import "pe"
rule Ransomware_Note_Patterns {
meta:
description = "Common ransom note strings across ransomware families"
author = "CyberSec Pro Academy - L18"
severity = "CRITICAL"
strings:
// Generic ransom demand language
$r1 = "Your files have been encrypted" nocase
$r2 = "All your files are encrypted" nocase
$r3 = "pay the ransom" nocase wide
$r4 = "bitcoin" nocase
$r5 = ".onion" nocase
$r6 = "decrypt" nocase
$r7 = "Your personal ID" nocase
// Specific families
$wannacry = "Wanna Decryptor" nocase
$lockbit = "LockBit" nocase
$blackcat = "ALPHV" nocase
$ryuk = "RyukReadMe" nocase
condition:
(3 of ($r*)) or any of ($wannacry, $lockbit, $blackcat, $ryuk)
}
rule Ransomware_Shadow_Delete {
meta:
description = "Commands to delete shadow copies — ransomware pre-cursor"
mitre = "T1490 - Inhibit System Recovery"
severity = "HIGH"
strings:
$vss1 = "vssadmin delete shadows" nocase wide
$vss2 = "wmic shadowcopy delete" nocase wide
$vss3 = "bcdedit /set {default} recoveryenabled No" nocase wide
$vss4 = "wbadmin delete catalog" nocase wide
$vss5 = "diskshadow /s" nocase wide
condition:
any of them
}
rule Ransomware_WannaCry {
meta:
description = "WannaCry/WannaCrypt ransomware"
author = "CyberSec Pro Academy - L18"
hash_md5 = "db349b97c37d22f5ea1d1841e3c89eb4"
reference = "https://attack.mitre.org/software/S0366/"
strings:
$kill_switch = "www.iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea.com" ascii
$ransom_msg = "Wanna Decryptor" ascii wide
$mssecsvc = "mssecsvc.exe" ascii
$wncry_ext = ".WNCRY" ascii wide
$taskdl = "taskdl.exe" ascii
$doublep = { 00 00 00 00 00 00 00 00 00 00 00 00 } // Common padding
condition:
uint16(0) == 0x5A4D and
(($kill_switch and $ransom_msg) or
($wncry_ext and 2 of ($mssecsvc, $taskdl, $ransom_msg)))
}
EOF
yara ~/yara-lab/rules/04_ransomware.yar ~/yara-lab/samples/ 2>/dev/null
echo "Ransomware rules created"
yarGen automatically generates YARA rules by finding strings unique to your samples vs its goodware DB:
cd ~/yarGen # Generate rules from your malware samples directory python3 yarGen.py \ --excludegood \ -m ~/yara-lab/samples/ \ -o ~/yara-lab/rules/auto_generated.yar \ --score 60 echo "Auto-generated rules:" cat ~/yara-lab/rules/auto_generated.yar | head -80
Review and refine the auto-generated rules — yarGen gives you a starting point, not a finished product.
cat > ~/yara-lab/test_rules.py << 'EOF'
import yara, os, glob, sys
def test_rules(rule_file, clean_dir, malware_dir=None):
print(f"\nTesting: {rule_file}")
try:
rules = yara.compile(rule_file)
except yara.SyntaxError as e:
print(f" [SYNTAX ERROR] {e}")
return
# Test against clean files
fp = 0
clean_total = 0
for filepath in glob.glob(f"{clean_dir}/**/*", recursive=True):
if not os.path.isfile(filepath): continue
clean_total += 1
try:
matches = rules.match(filepath)
if matches:
fp += 1
print(f" [FALSE POSITIVE] {os.path.basename(filepath)}: {[m.rule for m in matches]}")
except Exception: pass
print(f" Clean files: {clean_total} tested, {fp} false positives ({fp/max(clean_total,1)*100:.1f}% FPR)")
# Test against malware (if provided)
if malware_dir:
tp = 0
mal_total = 0
for filepath in glob.glob(f"{malware_dir}/**/*", recursive=True):
if not os.path.isfile(filepath): continue
mal_total += 1
try:
matches = rules.match(filepath)
if matches:
tp += 1
except Exception: pass
if mal_total > 0:
print(f" Malware: {tp}/{mal_total} detected ({tp/mal_total*100:.1f}% detection rate)")
# Run tests
for rule_file in glob.glob('rules/*.yar'):
test_rules(rule_file, 'test_clean', 'test_malware')
EOF
# Populate test_clean with Linux system binaries
cp /bin/ls /usr/bin/python3 ~/yara-lab/test_clean/ 2>/dev/null
find /usr/share/doc -name "*.txt" -size -100k | head -20 | \
xargs -I{} cp {} ~/yara-lab/test_clean/ 2>/dev/null
cd ~/yara-lab && python3 test_rules.py
# Compile all rules into a single scan
cat > ~/yara-lab/scan_system.sh << 'BASH'
#!/bin/bash
# YARA system scan with all rules
RULE_DIR=~/yara-lab/rules
SCAN_TARGET=${1:-/tmp}
LOG_FILE=~/yara-lab/reports/scan_$(date +%Y%m%d_%H%M%S).txt
echo "Starting YARA scan of $SCAN_TARGET" | tee "$LOG_FILE"
echo "Rules: $RULE_DIR" | tee -a "$LOG_FILE"
echo "Time: $(date -u)" | tee -a "$LOG_FILE"
echo "---" | tee -a "$LOG_FILE"
# Compile all rules
RULES=""
for yar in "$RULE_DIR"/*.yar; do
RULES="$RULES $yar"
done
# Scan with all rules
yara -r -m \
--compiled-rules \
$RULES \
"$SCAN_TARGET" 2>/dev/null | tee -a "$LOG_FILE"
echo "---" | tee -a "$LOG_FILE"
echo "Scan complete: $(grep -c 'rule\|found' "$LOG_FILE" 2>/dev/null || echo 0) matches" | tee -a "$LOG_FILE"
BASH
chmod +x ~/yara-lab/scan_system.sh
# Run a test scan on /tmp
~/yara-lab/scan_system.sh /tmp
# Python script to YARA-scan files from Splunk alerts
cat > ~/yara-lab/splunk_yara_hook.py << 'EOF'
import yara, sys, json, os
# Load all rules
RULE_DIR = os.path.expanduser('~/yara-lab/rules')
rules = yara.compile(filepath=f"{RULE_DIR}/00_combined.yar") if \
os.path.exists(f"{RULE_DIR}/00_combined.yar") else None
def scan_file(filepath):
"""Scan a file path from Splunk alert and return YARA findings."""
if not os.path.exists(filepath):
return {'error': 'File not found', 'path': filepath}
if not rules:
return {'error': 'No YARA rules loaded'}
try:
matches = rules.match(filepath)
return {
'path': filepath,
'matches': [{
'rule': m.rule,
'tags': m.tags,
'meta': m.meta,
'strings': [(s.identifier, hex(s.instances[0].offset) if s.instances else '')
for s in m.strings]
} for m in matches],
'hit_count': len(matches)
}
except Exception as e:
return {'error': str(e), 'path': filepath}
if __name__ == '__main__':
target = sys.argv[1] if len(sys.argv) > 1 else '/tmp'
result = scan_file(target)
print(json.dumps(result, indent=2))
EOF
# In Splunk, you'd call this via:
# | script python /opt/yara/splunk_yara_hook.py [filepath_field]
echo "Splunk integration script ready"
Key YARA performance and accuracy best practices:
# Performance: Use file size and magic byte conditions first (fast reject)
# Rule: Most restrictive condition FIRST to fail fast
# BAD (scans all files for strings)
rule Slow_Rule {
strings: $s = "malware_string"
condition: $s
}
# GOOD (rejects non-PE files immediately)
rule Fast_Rule {
strings: $s = "malware_string"
condition: uint16(0) == 0x5A4D and filesize < 10MB and $s
}
# Check rule performance with --profile
yara --profile ~/yara-lab/rules/04_ransomware.yar /tmp 2>/dev/null | head -20
# Measure rule scan time
time yara -r ~/yara-lab/rules/ /usr/bin/ 2>/dev/null | wc -l
| Metric | Value |
|---|---|
| Rules written (manual) | |
| Rules auto-generated (yarGen) | |
| False positive rate (clean test) | |
| Malware detection rate | |
| Families covered | |
| MITRE techniques covered |