AI→Sec

Module 1: Software Security

Software Security: Program Understanding, Vulnerability Discovery and LLM-Guided Fuzzing

From semantic lifting to LLM-assisted decompilation and automated bug finding

Outcome: Build semantic visibility into code, automate vulnerability discovery, and prioritize findings.

Learning Objectives (3)

Topic Map (4)

Topic Map — Deep Dives (4)

Cross-arch function similarity
- IR/graph embeddings; control/data-flow features
- Use-cases: clone search, patch diffing, backport triage
Semantic lifting & LLM-assisted decompilation
- Bytes → IR → SSA; recover types/ABI contracts
- LLMs to refine decompiler output (naming, comments, explanations)
- End-to-end vs. refinement workflows (LLM4Decompile-style)
Taint (static/dynamic/hybrid)
- Sources/sinks catalogs; interproc taint; aliasing pain points
- Hybrid: selective concolic on hot edges; sanitization gates
LLM-guided fuzzing
- Grammar inference from samples/spec; high-value seed proposals
- NL oracles with structured verdicts; minimize hallucinations

Key Shifts Powered by AI (3)

Semantics over Syntax Neural IR/graph embeddings align functions across compilers/architectures, enabling robust clone search and smarter triage.
Why it matters: Port findings across forks/platforms; prioritize semantically similar code paths.
LLMs as Co-Pilots, Not Pilots LLMs excel at decompilation cleanup and grammar/seed hints but must be gated by structured prompts and human oversight. ^{[llm4decompile]}
Why it matters: Cleaner code understanding without unsafe automation; reproducible improvements.
Data-Driven Fuzzing Models infer grammars/states and propose high-value seeds; LLMs act as natural-language oracles. ^[chatafl] ^[degpt]
Why it matters: More unique crashes with fewer executions; faster triage.

Still Hard (4)