AI→Sec

Module 1: Software Security

Software Security: Program Understanding, Vulnerability Discovery and LLM-Guided Fuzzing

From semantic lifting to LLM-assisted decompilation and automated bug finding

Outcome: Build semantic visibility into code, automate vulnerability discovery, and prioritize findings.

Learning Objectives (3)
  • Lift binaries to IR and leverage cross-arch semantic similarity
  • Use LLMs to refine decompilation (naming, comments, structure) safely
  • Combine taint + risk scoring + LLM-guided fuzzing to reach real bugs
Topic Map (4)
  • Cross-arch function similarity
  • LLM-assisted decompilation
  • Taint (static/dynamic/hybrid)
  • LLM-guided fuzzing (seeds/grammars, NL oracles)
Topic Map — Deep Dives (4)
  • Cross-arch function similarity
    • IR/graph embeddings; control/data-flow features
    • Use-cases: clone search, patch diffing, backport triage
  • Semantic lifting & LLM-assisted decompilation
    • Bytes → IR → SSA; recover types/ABI contracts
    • LLMs to refine decompiler output (naming, comments, explanations)
    • End-to-end vs. refinement workflows (LLM4Decompile-style)
  • Taint (static/dynamic/hybrid)
    • Sources/sinks catalogs; interproc taint; aliasing pain points
    • Hybrid: selective concolic on hot edges; sanitization gates
  • LLM-guided fuzzing
    • Grammar inference from samples/spec; high-value seed proposals
    • NL oracles with structured verdicts; minimize hallucinations
Key Shifts Powered by AI (3)
  • Semantics over Syntax Neural IR/graph embeddings align functions across compilers/architectures, enabling robust clone search and smarter triage.
    Why it matters: Port findings across forks/platforms; prioritize semantically similar code paths.
  • LLMs as Co-Pilots, Not Pilots LLMs excel at decompilation cleanup and grammar/seed hints but must be gated by structured prompts and human oversight. [llm4decompile]
    Why it matters: Cleaner code understanding without unsafe automation; reproducible improvements.
  • Data-Driven Fuzzing Models infer grammars/states and propose high-value seeds; LLMs act as natural-language oracles. [chatafl] [degpt]
    Why it matters: More unique crashes with fewer executions; faster triage.
Still Hard (4)
  • Ground-truthing semantic similarity (compiler tricks, opaque predicates)
  • Precise taint under concurrency/JIT/self-modifying code
  • LLM oracle reliability & reproducibility
  • Hybrid loop false positives; solver budget control

References

  1. Sheng et al. “All You Need Is A Fuzzing Brain: An LLM-Powered System for Automated Vulnerability Detection and Patching.” AIxCC technical report, 2025.
  2. Lacomis et al. “LLM4Decompile: Decompiling Binary Code with Large Language Models.” EMNLP 2024.
  3. Meng et al. “Large Language Model guided Protocol Fuzzing (ChatAFL).” NDSS 2024.
  4. Hu et al. “DeGPT: Optimizing Decompiler Output with LLM.” NDSS 2024.
  5. Xia et al. “Fuzz4All: Universal Fuzzing with Large Language Models.” ICSE 2024.
  6. Chen et al. “ELFuzz: Efficient Input Generation via LLM-driven Synthesis Over Fuzzer Space.” USENIX Security 2025.
  7. Fang et al. “Large Language Models for Code Analysis: Do LLMs Really Do Their Job?” USENIX Security 2024.