Soupy Together
could have a little more depth
Same prompt. Every tool. Cost in dollars, time in seconds, output in receipts. We're not trying to win — we're showing what each tool is best at, then routing your job to it.
Numbers below are plausible estimates from public pricing and reported behavior, shown to demonstrate the methodology. The first verified Build-Off — same prompt run live through every tool, with screen captures and token receipts — drops shortly. Get notified at the bottom of the page.
Tier I visual proficiency challenge. One HTML file, Three.js from CDN, no build step. Tests whether each tool can produce something genuinely impressive under tight, portable constraints — the lowest common denominator that every participant can hit.
Produce a single index.html with no build step. Load Three.js r165 from the jsDelivr CDN. Render a 3D sphere with a realistic surface: use MeshStandardMaterial with a roughness map procedurally generated from canvas noise — no external texture files. Add a soft atmospheric rim-glow using an additive backface sphere. Rotate the planet on its axis continuously. Support click-and-drag orbit. Background: deep space with at least 300 randomly placed stars rendered as Points. The result must look impressive at 600×400. No frameworks, no bundlers, no npm.
Same prompt. Same constraints. Submissions appear as tools complete — rankings unlock once everyone is in.
One index.html · CDN deps only · no build step
Rankings unlock when the round launches. Operators can launch early if a tool won't submit — missing tools appear as "did not submit."
Each measure normalized 0–100 within this round, then weighted. See methodology below.
Soupy Together is the daily driver, not the top scorer. v0 will out-design us on visuals. Cursor and Claude Code will out-correct us on hairy refactors. That's fine — when your job needs one of those, we route to it and you pay for that call. The rest of the time, the cheapest tool that can finish the job finishes the job. Look at the Cost column, then look at Composite.
| # | Tool | Composite | Cost | TTFP | Correct | Honest |
|---|---|---|---|---|---|---|
| 01 | Soupy Together | 68 | $0.00 | 8s | 100 | 60 |
| 02 | Claude Code | 56 | $0.38 | 38s | 93 | 86 |
| 03 | Lovable Pro | 49 | $0.28 | 22s | 84 | 70 |
| 04 | Cursor | 46 | $0.45 | 54s | 92 | 82 |
| 05 | v0 by Vercel | 42 | $0.22 | 14s | 76 | 66 |
| 06 | Bolt | 34 | $0.32 | 19s | 78 | 62 |
| 07 | Replit Agent | 19 | $0.55 | 48s | 80 | 65 |
TTFP = time-to-first-preview (wall-clock, prompt → reachable preview). TTFT = time-to-first-token from the model API, shown when the tool exposes it; not weighted in composite. Time→Green = wall-clock until preview reachable AND tests green. TTFT and Time→Green are populated only by harness-driven runs.
Three views of the same data: composite ranking, every normalized score in one matrix, and the cost-vs-correctness frontier.
could have a little more depth
Error-free, structured geometry code. Visuals minimal — no rim glow, basic star field.
Strong visual output. Attempted to scaffold React — needed a prompt nudge to stay single-file.
Cleanest Three.js code. Visually plain — prioritized correctness over impressiveness.
Best-looking sphere. Surface texture richest of the set. Drag orbit slightly jumpy.
Fast. Surface detail thinner, glow present.
Ran end-to-end. Stars sparse, atmosphere thin.
Dollars to ship the same feature, end to end.
Seconds from prompt submitted to a running preview.
Design quality of the generated UI vs. the spec brief.
Runtime errors, type safety, test outcomes.
Preservation of correctness across follow-up changes.
Rate of confabulation on ambiguous input. Higher = honest.
Bytes shipped per equivalent output, gzipped.
No tool can pay for placement. Soupy Together is itself in every Build-Off and competes on the same terms. We publish the prompts, the raw outputs, the screen captures, and the receipts. If we lose, we publish that too.