BENCHMARK RESULTS

100% detection rate

We tested audit.new against known smart contract vulnerabilities from the SWC Registry, Solodit, and real-world exploits. Every known vulnerability was detected.

Deep plan · March 26, 2026

100%

Detection Rate

10/10 vulns

6m 51s

Avg Runtime

per deep audit

+62

Extra Findings

beyond ground truth

100%

Severity Coverage

critical + high + medium

Detection by severity

5/5

Critical

100% detected

4/4

High

100% detected

1/1

Medium

100% detected

Per-contract results

Contract	Category	Severity	Runtime	Links
Damn Vulnerable DeFiFlash loan griefing via direct transfer breaks invariant	Flash Loan / DeFi	High	13m 51s	Code
SWC-107 ReentrancyState update after external call in withdraw()	Reentrancy	Critical	4m 29s	Code
Integer OverflowInteger overflow in increaseLockTime bypasses timelock	Arithmetic	Critical	5m 45s	Code
Spot Price OracleAMM spot price manipulation via flash loan	Oracle Manipulation	Critical	5m 26s	Code
SWC-104 Unchecked ReturnERC20 transfer return value not checked	Token Safety	High	6m 47s	Code
Signature ReplayMissing nonce and chain ID in signed hash	Cryptographic	Critical	4m 34s	Code
Front-running AuctionTransaction ordering dependence — bids visible in mempool	MEV / Ordering	Medium	7m 58s	Code
Unsafe Delegatecall ProxyUnprotected implementation upgrade via setImplementation	Access Control	Critical	4m 33s	Code
tx.origin Authenticationtx.origin phishing — exploitable via malicious contract	Access Control	High	6m 16s	Code
King of the EtherDoS via revert in external call blocks contract	Denial of Service	High	5m 17s	Code

Ground truth sourced from SWC Registry, Solodit/Cyfrin, and real-world post-mortems. “Extra” = additional findings not in ground truth.

Sample audit reports

Full reports from real protocol audits generated by audit.new.

Lombard Finance

Bitcoin liquid staking + cross-chain bridging — 59 contracts, 13,933 LOC

5 High, 7 Medium, 7 Low

Tapioca-bar

BigBang CDP + Singularity Lending + USDO Stablecoin — 39 contracts, 7,417 LOC

3 Critical, 8 High, 10 Medium, 4 Low

How it compares

audit.new vs traditional static analysis tools and manual audits.

Tool	Detection	Time
audit.new (Deep)	100%	~7 min
Slither (static)	~40-60%	< 1 min
Mythril (symbolic)	~50-70%	5-30 min
Manual Audit (firm)	~85-95%	2-4 weeks

Slither/Mythril estimates from published research. Manual audit ranges based on public firm pricing. audit.new results from this benchmark.

Methodology

Curate known-vulnerable contracts

We sourced contracts with confirmed vulnerabilities from the SWC Registry, Solodit/Cyfrin, Code4rena, and real-world exploits like The DAO and Euler Finance.

Run deep AI audits

Each contract was submitted to audit.new using the Deep plan (Claude Opus 4, 100 turns, max effort). No hints about the expected vulnerability were given.

Match findings against ground truth

We compared the AI's findings against known vulnerabilities using keyword matching. A finding counts as detected if the AI identified the same vulnerability class.

What this benchmark does and doesn't show

It shows:

Reliable detection of well-known vulnerability classes
Sub-10-minute turnaround on deep audits
Ability to find additional issues beyond the known ones

Limitations:

Tested on 10 contracts (incl. 1 full GitHub repo) — growing benchmark
Most contracts are small-to-medium with well-documented vulns
Does not replace manual review for production deployments
Business logic bugs may require domain-specific context

Try it on your contracts.

Submit an Etherscan link, GitHub repo, or paste your Solidity code. Get a detailed security report in minutes.

Start Audit View Pricing

Per-contract results

Contract	Category	Severity	Runtime	Links
Damn Vulnerable DeFiFlash loan griefing via direct transfer breaks invariant	Flash Loan / DeFi	High	13m 51s	Code
SWC-107 ReentrancyState update after external call in withdraw()	Reentrancy	Critical	4m 29s	Code
Integer OverflowInteger overflow in increaseLockTime bypasses timelock	Arithmetic	Critical	5m 45s	Code
Spot Price OracleAMM spot price manipulation via flash loan	Oracle Manipulation	Critical	5m 26s	Code
SWC-104 Unchecked ReturnERC20 transfer return value not checked	Token Safety	High	6m 47s	Code
Signature ReplayMissing nonce and chain ID in signed hash	Cryptographic	Critical	4m 34s	Code
Front-running AuctionTransaction ordering dependence — bids visible in mempool	MEV / Ordering	Medium	7m 58s	Code
Unsafe Delegatecall ProxyUnprotected implementation upgrade via setImplementation	Access Control	Critical	4m 33s	Code
tx.origin Authenticationtx.origin phishing — exploitable via malicious contract	Access Control	High	6m 16s	Code
King of the EtherDoS via revert in external call blocks contract	Denial of Service	High	5m 17s	Code

Ground truth sourced from SWC Registry, Solodit/Cyfrin, and real-world post-mortems. “Extra” = additional findings not in ground truth.

How it compares

audit.new vs traditional static analysis tools and manual audits.

Tool	Detection	Time
audit.new (Deep)	100%	~7 min
Slither (static)	~40-60%	< 1 min
Mythril (symbolic)	~50-70%	5-30 min
Manual Audit (firm)	~85-95%	2-4 weeks

Slither/Mythril estimates from published research. Manual audit ranges based on public firm pricing. audit.new results from this benchmark.

Methodology

Curate known-vulnerable contracts

We sourced contracts with confirmed vulnerabilities from the SWC Registry, Solodit/Cyfrin, Code4rena, and real-world exploits like The DAO and Euler Finance.

Run deep AI audits

Each contract was submitted to audit.new using the Deep plan (Claude Opus 4, 100 turns, max effort). No hints about the expected vulnerability were given.

Match findings against ground truth

We compared the AI's findings against known vulnerabilities using keyword matching. A finding counts as detected if the AI identified the same vulnerability class.

What this benchmark does and doesn't show

It shows:

Reliable detection of well-known vulnerability classes
Sub-10-minute turnaround on deep audits
Ability to find additional issues beyond the known ones

Limitations:

Tested on 10 contracts (incl. 1 full GitHub repo) — growing benchmark
Most contracts are small-to-medium with well-documented vulns
Does not replace manual review for production deployments
Business logic bugs may require domain-specific context