§ BUILD-OFF · ROUND 005
TIER I
SAMPLE · IN-REPO FALLBACK

3D rotating planet

Same prompt. Every tool. Cost in dollars, time in seconds, output in receipts. We're not trying to win — we're showing what each tool is best at, then routing your job to it.

§ SAMPLE DATA — NOT YET A LIVE VERIFIED RUN

Numbers below are plausible estimates from public pricing and reported behavior, shown to demonstrate the methodology. The first verified Build-Off — same prompt run live through every tool, with screen captures and token receipts — drops shortly. Get notified at the bottom of the page.


§ THE PROMPT
BRIEF

Tier I visual proficiency challenge. One HTML file, Three.js from CDN, no build step. Tests whether each tool can produce something genuinely impressive under tight, portable constraints — the lowest common denominator that every participant can hit.

DATE · 2026-05-10
PROMPT (VERBATIM)

Produce a single index.html with no build step. Load Three.js r165 from the jsDelivr CDN. Render a 3D sphere with a realistic surface: use MeshStandardMaterial with a roughness map procedurally generated from canvas noise — no external texture files. Add a soft atmospheric rim-glow using an additive backface sphere. Rotate the planet on its axis continuously. Support click-and-drag orbit. Background: deep space with at least 300 randomly placed stars rendered as Points. The result must look impressive at 600×400. No frameworks, no bundlers, no npm.


§ VISUAL SHOWCASE

What each tool built.

Same prompt. Same constraints. Submissions appear as tools complete — rankings unlock once everyone is in.

Tier I — Single file

One index.html · CDN deps only · no build step

1 submitted · 1 withdrew · 5 pending
ST
Soupy Together
harness-served
LP
Lovable Pro
Not entered · this tier
B
Bolt
Awaiting submission
VB
v0 by Vercel
Awaiting submission
C
Cursor
Withdrew
Declined this round
RA
Replit Agent
Awaiting submission
CC
Claude Code
Awaiting submission

Rankings unlock when the round launches. Operators can launch early if a tool won't submit — missing tools appear as "did not submit."


§ LEADERBOARD

Composite ranking

Each measure normalized 0–100 within this round, then weighted. See methodology below.

§ HOW TO READ THIS

Soupy Together is the daily driver, not the top scorer. v0 will out-design us on visuals. Cursor and Claude Code will out-correct us on hairy refactors. That's fine — when your job needs one of those, we route to it and you pay for that call. The rest of the time, the cheapest tool that can finish the job finishes the job. Look at the Cost column, then look at Composite.

#ToolCompositeCostTTFPCorrectHonest
01Soupy Together68$0.008s10060
02Claude Code56$0.3838s9386
03Lovable Pro49$0.2822s8470
04Cursor46$0.4554s9282
05v0 by Vercel42$0.2214s7666
06Bolt34$0.3219s7862
07Replit Agent19$0.5548s8065

TTFP = time-to-first-preview (wall-clock, prompt → reachable preview). TTFT = time-to-first-token from the model API, shown when the tool exposes it; not weighted in composite. Time→Green = wall-clock until preview reachable AND tests green. TTFT and Time→Green are populated only by harness-driven runs.


§ VISUALS

The shape of the round.

Three views of the same data: composite ranking, every normalized score in one matrix, and the cost-vs-correctness frontier.

FIG. 01 — COMPOSITE SCORE, ALL TOOLS
0255075100Soupy Together68Claude Code56Lovable Pro49Cursor46v0 by Vercel42Bolt34Replit Agent19
FIG. 02 — NORMALIZED SCORE MATRIX (0–100, PER MEASURE)
COSTw 22%TIMEw 12%VISUALw 16%CODEw 20%REFACTORw 14%HONESTYw 10%BUNDLEw 6%Soupy Together1001000100600100Claude Code313544711001000Lovable Pro4970883335380Cursor180496785850v0 by Vercel608710000230Bolt42767381080Replit Agent013591720190
0
100
FIG. 03 — COST vs. CORRECTNESS · BUBBLE = COMPOSITE
0255075100$0.00$0.28$0.55COST PER OUTPUT (USD) →↑ CORRECTNESS (0–100)Soupy TogetherClaude CodeLovable ProCursorv0 by VercelBoltReplit Agent

§ FULL RESULTS

Per-tool breakdown

RANK 01

Soupy Together

could have a little more depth

COMPOSITE
68
Cost per output
$0.00
100/100
Time to first preview
8s
100/100
Visual fidelity
50
0/100
Code correctness
100
100/100
Refactor reliability
40
60/100
Honesty under uncertainty
60
0/100
Bundle size
6 kB
100/100
PROFILE
CostTimeVisualCodeRefactorHonestyBundle
RANK 02

Claude Code

Error-free, structured geometry code. Visuals minimal — no rim glow, basic star field.

COMPOSITE
56
Cost per output
$0.38
31/100
Time to first preview
38s
35/100
Visual fidelity
68
44/100
Code correctness
93
71/100
Refactor reliability
48
100/100
Honesty under uncertainty
86
100/100
Bundle size
12 kB
0/100
PROFILE
CostTimeVisualCodeRefactorHonestyBundle
RANK 03

Lovable Pro

Strong visual output. Attempted to scaffold React — needed a prompt nudge to stay single-file.

COMPOSITE
49
Cost per output
$0.28
49/100
Time to first preview
22s
70/100
Visual fidelity
86
88/100
Code correctness
84
33/100
Refactor reliability
35
35/100
Honesty under uncertainty
70
38/100
Bundle size
12 kB
0/100
PROFILE
CostTimeVisualCodeRefactorHonestyBundle
RANK 04

Cursor

Cleanest Three.js code. Visually plain — prioritized correctness over impressiveness.

COMPOSITE
46
Cost per output
$0.45
18/100
Time to first preview
54s
0/100
Visual fidelity
70
49/100
Code correctness
92
67/100
Refactor reliability
45
85/100
Honesty under uncertainty
82
85/100
Bundle size
12 kB
0/100
PROFILE
CostTimeVisualCodeRefactorHonestyBundle
RANK 05

v0 by Vercel

Best-looking sphere. Surface texture richest of the set. Drag orbit slightly jumpy.

COMPOSITE
42
Cost per output
$0.22
60/100
Time to first preview
14s
87/100
Visual fidelity
91
100/100
Code correctness
76
0/100
Refactor reliability
28
0/100
Honesty under uncertainty
66
23/100
Bundle size
12 kB
0/100
PROFILE
CostTimeVisualCodeRefactorHonestyBundle
RANK 06

Bolt

Fast. Surface detail thinner, glow present.

COMPOSITE
34
Cost per output
$0.32
42/100
Time to first preview
19s
76/100
Visual fidelity
80
73/100
Code correctness
78
8/100
Refactor reliability
30
10/100
Honesty under uncertainty
62
8/100
Bundle size
12 kB
0/100
PROFILE
CostTimeVisualCodeRefactorHonestyBundle
RANK 07

Replit Agent

Ran end-to-end. Stars sparse, atmosphere thin.

COMPOSITE
19
Cost per output
$0.55
0/100
Time to first preview
48s
13/100
Visual fidelity
74
59/100
Code correctness
80
17/100
Refactor reliability
32
20/100
Honesty under uncertainty
65
19/100
Bundle size
12 kB
0/100
PROFILE
CostTimeVisualCodeRefactorHonestyBundle

§ METHODOLOGY

How we measure.

Cost per output · WEIGHT 22%

Dollars to ship the same feature, end to end.

Unit: USD · Lower is better
Time to first preview · WEIGHT 12%

Seconds from prompt submitted to a running preview.

Unit: seconds · Lower is better
Visual fidelity · WEIGHT 16%

Design quality of the generated UI vs. the spec brief.

Unit: score · Higher is better
Code correctness · WEIGHT 20%

Runtime errors, type safety, test outcomes.

Unit: score · Higher is better
Refactor reliability · WEIGHT 14%

Preservation of correctness across follow-up changes.

Unit: score · Higher is better
Honesty under uncertainty · WEIGHT 10%

Rate of confabulation on ambiguous input. Higher = honest.

Unit: score · Higher is better
Bundle size · WEIGHT 6%

Bytes shipped per equivalent output, gzipped.

Unit: kB · Lower is better
§ EDITORIAL NEUTRALITY

No tool can pay for placement. Soupy Together is itself in every Build-Off and competes on the same terms. We publish the prompts, the raw outputs, the screen captures, and the receipts. If we lose, we publish that too.