Flamehaven LogoFlamehaven.space
back to writing
Now Is the “Early, Messy Renaissance” of AI — Reading 2025 from a Florentine Workshop

Now Is the “Early, Messy Renaissance” of AI — Reading 2025 from a Florentine Workshop

Is AI a bubble or a renaissance? A practical playbook for founders and AI teams to move beyond hype with metrics, reproducibility, and real results.

notion image
🧠tl;drDon’t just ask if AI is a bubble — quietly choose the renaissance.Let your skills compound faster than the hype.

Dawn in the bottega, dawn at the hackathon — Showroom vs. Factory

Florence, c. 1420. Before sunrise, the city smelled of lime, oil, and rain. Wet stone exhaled; shutters leaked a blade of light into a bottega where apprentices ground pigment until muscle turned memory. Copying, borrowing — even forgery — wasn’t scandal. It was practice, like scales on a violin: repetition that makes the hand honest.
Cut to 2025. Monitor blue replaces candle flame; coffee and warm plastic replace chalk and glue. At one table a prompt is massaged for “a cleaner sample.” Across the room, similar apps blink behind different logos. Skins change; back-ends rhyme. Outwardly everything hums “AI innovation,” yet underneath the split is the same: most teams run a showroom; a few run a factory.
Most polish glass; few pour concrete.
I’ve been guilty too: once I shipped a beautiful demo that went viral for a weekend — and evaporated by Tuesday. The retention plot looked like a cliff. Logs, not likes, told the truth.
Applause is a moment; Reproduction is a system. If your best slide and your best log disagree, the log is right.

The Medici ledger vs. the hyperscaler meter — Patronage, Heat, and Pace

Florence’s pulse lived in a book — the Medici ledger: “Keep this workshop alive.” “Let the dome rise.” Patronage warmed the kiln — and sometimes overheated prices through vanity and rivalry. Money didn’t make the art, but it set the room temperature.
Today the pulse hums behind glass and steel. Racks stack; LEDs blink in orderly trains; the air smells of ozone and metal. A quarterly line reads: “Power usage equivalent to a small city.” That sentence is capital breathing fast. Announcements add heat; heat adds attention; attention adds… deadlines you didn’t choose.
🔺Truth: Patronage grows the fire.
🔺Risk: If money’s speed outruns craft’s ripening, the era’s name changes — from renaissance to bubble. It feels productive to sprint to announcements, but the bill arrives in reliability, debt, and trust.
Align your roadmap to capital’s heartbeat and you’ll sprint on paper while crawling in delivery. Align it to capability growth, and the pace looks slow until — suddenly — it doesn’t.

Two pens — Headlines vs. Workshop Logs

History writes in stereo.
One pen is thick — the headlineriskfrothcorrection. It’s loud by design.
The other is thin — the workshop logresponse time downresolution upnewcomers hitting mid-tier outputtasks closing faster than last month. It’s quiet by design.
A bubble moves at the speed of belief; a renaissance moves at the speed of results.
Start now with one visible line of proof — a north-star metric your team can repeat weekly. Not ten metrics; one that actually changes behavior.
Examples you can steal (pick one):
  • Support/Ops: “First-reply resolution  week-over-week.”
  • Engineering: “p95 latency ≤ 1.2× baseline across 3 reruns.”
  • Team Effectiveness: “Tier-2 engineers clearing X% tickets with AI assist.”
  • Product: “New-flow retention Day1 → Day7 ↑ by X% after onboarding tweak.”
  • Marketing/Content: “Conversion rate  for AI-assisted copy vs human-only baseline on the same audience (A/B, equal spend).”

The metal smell of the press, the click of the model API — Replication at (Almost) Zero

Before Gutenberg, knowledge moved at the pace of monks — slow ink, slow reach. The press changed the slope. Europe flooded with pamphlets and nonsense, yes — but also astronomy, anatomy, and argument.
Flood + filter = progress.
Today our press is a model API. Text, images, code — printed on call at near-zero cost. When replication becomes the default, twins arrive: flood and flatness. Templates multiply; moats thin; screenshots start to look the same.
Depth becomes the new scarcity. Instant noodles are fine; a slow-simmered broth is different. Depth isn’t mystique; it’s ingredients and process:
  • Owned data (collected, cleaned, and permissioned)
  • Live feedback loops (from real usage, not lab daydreams)
  • A clear evaluation frame (so “better” means something stable)
When the flood rises, the teams that win filterreinforce, and thicken.
Replication is the tide; differentiation is your keel.

Brunelleschi’s mirror, today’s benchmark — From “Looks Right” to “Follows the Rule”

On the piazza, Brunelleschi holds a panel and a mirror. The geometry clicks — linear perspective steps out of hiding. Praise shifts from “it looks right” to “it follows the rule.” Awe becomes method; wonder becomes workflow.
Our mirror is less romantic but just as decisive:
  • Benchmarks with targets that matter to users
  • Reproducible experiments (seeds change; results stay within bounds)
  • Transparent data quality (how it’s labeled, checked, and versioned)
  • Learning curves that show slope, not just a lucky point
The day you move from “wow, it works” to “we can make it work again”, the view snaps into focus. Showrooms fade; factories stand.
Perspective split the bottega’s skill tree; benchmarks sort the family tree of AI teams.
One caution: avoid “benchmark theater.” A rule clarifies reality; it shouldn’t replace it. Keep one foot in logs, one foot in lives.

A map for action — Seven Moves (with concrete examples)

1) Fix one line of truth.
Speak in units, not adjectives. One sentence, one number.
Sample: “Agent first-reply resolution 41% → 53% (4 weeks).”
🔺How to start in 30 min: choose the metric, create a shared doc, write last week’s value.
2) Make experiments reproducible by default.
Log runs, snapshot data, publish curves.
Sample: “p95 latency ≤ 1.2× baseline across 3 reruns with different seeds.”
🔺30-min start: add a “rerun” script and a results table others can read.
3) Clean the data line.
Quality, bias checks, validity — weekly.
Sample: “PII leakage 0 on random 500 audit; domain drift <5%.”
🔺30-min start: schedule a recurring audit and a “fix or explain” rule.
4) Rebalance the ‘7–9 : 1–3’.
If ~90% of effort sits in wrapping, rescue 10–30% for the engine: data, training, evaluation.
Showrooms borrow light; factories build heat.
🔺30-min start: freeze one UI task; move that slot to eval or data repair.
5) Don’t pace your team by headlines.
Use big-spend news as motivation, not a speedometer.
Speed can be rented by money; direction must be earned by thinking.
🔺30-min start: replace “market update” in stand-up with “metric update.”
6) Differentiate on top of replication.
APIs are sandevaluation turns them into concrete.
🔺Scenario A (factory win): internal search adds eval-driven reranking → time-to-answer −18%ticket deflection +12%; retention follows.
🔺Scenario B (showroom trap): glossy chatbot peaks week 1, then DAU −62% by week 3; no data pipeline, no eval harness, no loop.
7) Publish the lab notebook.
Honest logs are a currency of trust; fake perfection accrues debugging debt.
Sample: “Missed KPI due to annotation drift; added dual-review and retrained. KPI recovered in 2 sprints.”
🔺30-min start: open a changelog; add one “what failed / what changed” line per week.
One ask for this week: pick one metric and repeat the same test next week. Consistency beats charisma.

Same structure, different smell (and one clear ask)

Florence smelled of stone and rain; our cities smell of silicon and static. Brushes became keyboards; chisels became GPUs. The arrangement repeats: a few chase the hidden law; the many live from it. Patrons inject heat; replication stretches reach; then someone names the rule — in prose, in geometry, in code — and the era gets its title.
Copy makes foam. Evaluation and law make a renaissance.
Do this now: write down one number, one case, one moment that proves improvement. Share it with your team. Then run the same proof next week. That’s how your team joins the renaissance side — on paper, not just in pitch decks.
🔺Monday-morning starter:
  • Add the metric to your stand-up.
  • Book a 30-minute “re-run the experiment” slot.
  • Post the before/after plot.Three steps. One week. A habit that survives headlines.
 

Share

Related Reading