Static analysis tools like Bandit, Semgrep, and CodeQL have become standard in modern CI/CD pipelines. They're fast, deterministic, and catch common vulnerability patterns. But they have a fundamental limitation: they operate on syntactic patterns, not semantic understanding.
The Pattern-Matching Gap
Consider a simple example. Bandit will flag subprocess.call(cmd, shell=True) as a potential command injection. That's useful. But it won't catch:
- A command string built across three functions and passed through a dictionary before reaching
subprocess - An
os.system()call where the argument comes from a database query that was populated by user input two HTTP requests earlier - A
pickle.loads()on data that arrives via a message queue from an untrusted service
These require understanding data flow across function boundaries, module boundaries, and even service boundaries. Pattern matchers don't do that.
What Deeper Analysis Finds
When we scan popular open-source projects, we consistently find categories of issues that traditional tools miss entirely:
Cross-module data flow vulnerabilities. User input enters at an API boundary, gets stored in a database, retrieved by a background worker, and used in an unsafe operation. No single file contains the full vulnerability chain.
Architectural weaknesses. Bottleneck modules where a single failure cascades across the entire system. These aren't "bugs" in the traditional sense — they're structural risks that only appear when you analyze the dependency graph.
Temporal patterns. Code that was safe when written but became vulnerable after a later refactor changed assumptions. Git history analysis reveals these evolutionary risks.
The Numbers
Across our scans of major open-source projects, we consistently find that traditional tools detect less than 5% of total findings. The remaining 95%+ require deeper analysis techniques:
- Data flow analysis across module boundaries
- Dependency graph analysis for architectural risks
- Git history mining for evolutionary vulnerabilities
- Semantic understanding of code intent vs. implementation
What This Means for Your Codebase
Running Bandit and Semgrep is a good start. But if you're relying solely on pattern-matching tools, you're seeing a fraction of the picture. The vulnerabilities that matter most — the ones that lead to actual breaches — tend to be the complex, multi-step chains that simple pattern matching can't detect.
Deep code analysis isn't a replacement for your existing tools. It's the layer that catches what they can't.