AI→Sec

Module 3: Malware Analysis

Malware Analysis with Deep Learning & Transformers

Static/dynamic signals → embeddings, families, campaigns

Outcome: Classify, cluster, and attribute malware using learned representations of static and dynamic behavior.

Learning Objectives (3)

Topic Map (5)

Topic Map — Deep Dives (3)

Static signals
- Headers/imports/sections; byte n-grams; entropy maps
- Bytes→CNN/Transformer vs. feature models; IR path when needed
Dynamic behavior
- Traces via ETW/Sysmon/eBPF; PID trees; rare event handling
- Temporal motifs (beacons); feature joins (file/registry/net)
Linkage
- ANN (FAISS/ScaNN); graph builds; community detection; centroids → IOCs

Key Shifts Powered by AI (3)

From Engineered Features to Learned Representations Large-scale learned embeddings generalize better than handcrafted features. ^[ember] ^[sorel] ^[malconv]
Why it matters: Improved transfer across campaigns; scalable triage.
Behavioral Modeling Sequence models over dynamic traces capture tactics hard to encode with rules. ^{[usenix23_humans]}
Why it matters: Earlier detection and family attribution.
Robustness Matters Adversarial/packed samples stress detectors; evaluation must include adaptive attackers. ^[provninja] ^[wolf24]
Why it matters: Hardened pipelines and realistic reporting.

Still Hard (4)