100% detection rate
We tested audit.new against known smart contract vulnerabilities from the SWC Registry, Solodit, and real-world exploits. Every known vulnerability was detected.
Deep plan · March 26, 2026
Detection by severity
Per-contract results
| Contract | Severity | Detected | Links |
|---|---|---|---|
Damn Vulnerable DeFiFlash loan griefing via direct transfer breaks invariant | High | Code | |
SWC-107 ReentrancyState update after external call in withdraw() | Critical | Code | |
Integer OverflowInteger overflow in increaseLockTime bypasses timelock | Critical | Code | |
Spot Price OracleAMM spot price manipulation via flash loan | Critical | Code | |
SWC-104 Unchecked ReturnERC20 transfer return value not checked | High | Code | |
Signature ReplayMissing nonce and chain ID in signed hash | Critical | Code | |
Front-running AuctionTransaction ordering dependence — bids visible in mempool | Medium | Code | |
Unsafe Delegatecall ProxyUnprotected implementation upgrade via setImplementation | Critical | Code | |
tx.origin Authenticationtx.origin phishing — exploitable via malicious contract | High | Code | |
King of the EtherDoS via revert in external call blocks contract | High | Code |
Ground truth sourced from SWC Registry, Solodit/Cyfrin, and real-world post-mortems. “Extra” = additional findings not in ground truth.
How it compares
audit.new vs traditional static analysis tools and manual audits.
| Tool | Detection | Time |
|---|---|---|
audit.new (Deep) | 100% | ~7 min |
Slither (static) | ~40-60% | < 1 min |
Mythril (symbolic) | ~50-70% | 5-30 min |
Manual Audit (firm) | ~85-95% | 2-4 weeks |
Slither/Mythril estimates from published research. Manual audit ranges based on public firm pricing. audit.new results from this benchmark.
Methodology
Curate known-vulnerable contracts
We sourced contracts with confirmed vulnerabilities from the SWC Registry, Solodit/Cyfrin, Code4rena, and real-world exploits like The DAO and Euler Finance.
Run deep AI audits
Each contract was submitted to audit.new using the Deep plan (Claude Opus 4, 100 turns, max effort). No hints about the expected vulnerability were given.
Match findings against ground truth
We compared the AI's findings against known vulnerabilities using keyword matching. A finding counts as detected if the AI identified the same vulnerability class.
What this benchmark does and doesn't show
It shows:
- Reliable detection of well-known vulnerability classes
- Sub-10-minute turnaround on deep audits
- Ability to find additional issues beyond the known ones
Limitations:
- Tested on 10 contracts (incl. 1 full GitHub repo) — growing benchmark
- Most contracts are small-to-medium with well-documented vulns
- Does not replace manual review for production deployments
- Business logic bugs may require domain-specific context
Try it on your contracts.
Submit an Etherscan link, GitHub repo, or paste your Solidity code. Get a detailed security report in minutes.