AI→Sec
Module 1: Software Security
Software Security: Program Understanding, Vulnerability Discovery and LLM-Guided Fuzzing
From semantic lifting to LLM-assisted decompilation and automated bug finding
Outcome: Build semantic visibility into code, automate vulnerability discovery, and prioritize findings.
Learning Objectives (3)
- Lift binaries to IR and leverage cross-arch semantic similarity
- Use LLMs to refine decompilation (naming, comments, structure) safely
- Combine taint + risk scoring + LLM-guided fuzzing to reach real bugs
Topic Map (4)
- Cross-arch function similarity
- LLM-assisted decompilation
- Taint (static/dynamic/hybrid)
- LLM-guided fuzzing (seeds/grammars, NL oracles)
Topic Map — Deep Dives (4)
- Cross-arch function similarity
- IR/graph embeddings; control/data-flow features
- Use-cases: clone search, patch diffing, backport triage
- Semantic lifting & LLM-assisted decompilation
- Bytes → IR → SSA; recover types/ABI contracts
- LLMs to refine decompiler output (naming, comments, explanations)
- End-to-end vs. refinement workflows (LLM4Decompile-style)
- Taint (static/dynamic/hybrid)
- Sources/sinks catalogs; interproc taint; aliasing pain points
- Hybrid: selective concolic on hot edges; sanitization gates
- LLM-guided fuzzing
- Grammar inference from samples/spec; high-value seed proposals
- NL oracles with structured verdicts; minimize hallucinations
Key Shifts Powered by AI (3)
- Semantics over Syntax Neural IR/graph embeddings align functions across compilers/architectures, enabling robust clone search and smarter triage. Why it matters: Port findings across forks/platforms; prioritize semantically similar code paths.
- LLMs as Co-Pilots, Not Pilots LLMs excel at decompilation cleanup and grammar/seed hints but must be gated by structured prompts and human oversight. [llm4decompile] Why it matters: Cleaner code understanding without unsafe automation; reproducible improvements.
- Data-Driven Fuzzing Models infer grammars/states and propose high-value seeds; LLMs act as natural-language oracles. [chatafl] [degpt] Why it matters: More unique crashes with fewer executions; faster triage.
Still Hard (4)
- Ground-truthing semantic similarity (compiler tricks, opaque predicates)
- Precise taint under concurrency/JIT/self-modifying code
- LLM oracle reliability & reproducibility
- Hybrid loop false positives; solver budget control
References
- Sheng et al. “All You Need Is A Fuzzing Brain: An LLM-Powered System for Automated Vulnerability Detection and Patching.” AIxCC technical report, 2025.
- Lacomis et al. “LLM4Decompile: Decompiling Binary Code with Large Language Models.” EMNLP 2024.
- Meng et al. “Large Language Model guided Protocol Fuzzing (ChatAFL).” NDSS 2024.
- Hu et al. “DeGPT: Optimizing Decompiler Output with LLM.” NDSS 2024.
- Xia et al. “Fuzz4All: Universal Fuzzing with Large Language Models.” ICSE 2024.
- Chen et al. “ELFuzz: Efficient Input Generation via LLM-driven Synthesis Over Fuzzer Space.” USENIX Security 2025.
- Fang et al. “Large Language Models for Code Analysis: Do LLMs Really Do Their Job?” USENIX Security 2024.