Flamehaven LogoFlamehaven.space
back to writing
The Meeting Nobody Could Follow -The format of AI output is a design decision. We made it wrong for three years.

The Meeting Nobody Could Follow -The format of AI output is a design decision. We made it wrong for three years.

How our engineering team stopped sending 200-line Markdown files that nobody read — and what a nine-word post from an Anthropic engineer taught us about AI output format as a design decision. Includes token cost analysis, real prompt templates, and the HTML render layer approach used in production.

notion image
Our team runs fast. Everyone uses AI — for code review, architecture decisions, issue triage, sprint planning. The individual work is solid. The outputs are good.
The problem shows up in the meeting.
Someone opens a PR and shares the AI-generated action plan in the standup. It’s a .md file, 200+ lines, logically structured, accurate. The engineer who ran it knows exactly what's in it.
The two people looking at it for the first time are scrolling, skimming, trying to locate what matters while the conversation moves on.
Our lead eventually just asked: “Can you highlight what’s actually blocking us right now?”
That wasn’t a knowledge gap. Everyone in the room was technical. It wasn’t a preparation gap either — the work had been done well. The gap was between the person who’d been living in that context and everyone else trying to enter it in 90 seconds from a flat text file.
That’s a format problem. And it compounds every time AI-generated work crosses from one person’s context into a shared one.

The Post That Reframed It

notion image
On May 8, 2026, Thariq Shihipar, an engineer on the Claude Code team at Anthropic, posted nine words on X: (1)
3The post linked to a companion site: 20 self-contained .html files, each one an agent-generated artifact covering a different category of engineering work. No build step. No framework. Just a file you open in a browser.
The line that stopped us was from Thariq’s framing of the Code Review category:
“Diffs and call-graphs are spatial information; markdown flattens them.”
That was the exact problem we’d been circling for weeks without naming it. Our action plans weren’t hard to read because they were long.
They were hard to read because the information inside them was spatial — priority relationships, status changes, size deltas, dependency chains — and we were delivering it as a flat sequence of text.
notion image
Simon Willison, whose writing on developer tooling is widely followed, read the piece the same day and wrote that it caused him to reconsider his three-year default of asking for everything in Markdown.(3)
His note: Markdown won because of constraints — the 8,192-token GPT-4 era, where every character counted. Those constraints are largely gone. The reasoning that locked in the default hasn’t been re-examined.
That evening, we ran our own action plan through Claude with one change:
“Output this as a standalone HTML file with priority-coded sections, status badges, and a visual summary header.”
Thirty minutes of iteration. The document we’d been failing to share effectively for weeks was suddenly something a new set of eyes could navigate in under a minute.

Why This Works: Source for Machines, Interface for People

notion image
Before going further, the actual rule is worth stating clearly, because the headline “HTML is the new markdown” gets misread.
This isn’t about replacing Markdown everywhere.
README files, commit notes, audit logs, agent-to-agent context passing — Markdown stays. It’s compact, diffable, searchable, and parseable without a browser. Those are real advantages that don’t disappear.
The shift is narrower: when AI-generated work reaches a human who needs to review, navigate, and act on it, Markdown hands the translation cost to the reader. HTML absorbs it into the document.
Karpathy replied to Thariq’s post with a practical note: (4)
“This works really well btw — at the end of your query ask your LLM to ‘structure your response as HTML’, then view the generated file in your browser.”
He added the underlying reason: roughly a third of the human brain is dedicated to visual processing — the 10-lane superhighway of information into the brain. Audio is the preferred input to AI. Vision is the preferred output from it.
That is why the format question matters beyond aesthetics. If AI output is increasingly something people have to review, navigate, and act on quickly, then the container is no longer a neutral wrapper.
A February 2026 Harvard Business Review study tracked 200 employees at a U.S. tech company over eight months and found that AI adoption intensified work rather than reducing it — 83% reported it increased their workload. (5)
The study doesn’t prove a format problem by itself. What it describes is the broader environment in which format problems become expensive: workers moving faster, taking on broader scope, with less time per handoff.
In that environment, a document format that requires translation before action isn’t a minor inconvenience. It’s a recurring tax.

The 20 Examples: A Map Worth Keeping

notion image
So what does a better container look like in practice?
Thariq’s companion site at thariqs.github.io/html-effectiveness is worth opening in a separate tab. It’s not a gallery — it’s a structured argument across 9 categories, each one showing where Markdown flattens information that HTML can spatialize.(2)
Here’s the full map for reference. Skim it — it’s a map, not a reading list. The three categories that actually changed our workflow are below the table.
notion image
Three categories directly changed how our team works.
  • Code Review (02). The annotated PR demo made the problem undeniable. Our action plans were spatial data — P0 vs P1 vs P2, open vs fixed, size deltas — delivered as a flat sequence. Moving to HTML turned priority into visual hierarchy, status into badges, and growth from 163 to 212 lines into an amber callout. The reviewer’s eye went to the right place without being guided there by more text.
  • Decks (06). A handful of <section> tags and a little JavaScript becomes a slide deck you can arrow-key through in a meeting. No Keynote. No export. The charts can stay live, the regional breakdown can be filterable, and the presenter stops defending the format and starts defending the idea.
  • Custom Editors (09). This is the category most people miss. It isn’t about pretty reports. It’s about asking for a throwaway interface for one specific decision — triaging tickets, toggling feature flags, tuning a prompt template — then exporting the result as structured Markdown for the next agent call. The loop gets tighter.

How We Applied It in Practice

notion image
The next step was turning the rule into a workflow.
we didn’t replace Markdown. We added an HTML render layer on top of it.
Every agent run now produces two files. The raw .md stays for version control, diffs, and agent context — passing HTML between agents adds token cost without value. The HTML is what the team opens.
The status bar at the top shows Open / Fixed / Partial counts at a glance. P0 items have a red left border and a pulse. Fixed items are muted. A function that grew from 163 to 212 lines shows the delta in an amber callout — visible without reading the note. The filter bar lets anyone drill to just the P0 items in one click.
Below is the representative example we use. Content has been generalized — no internal identifiers — but the structure, priority system, and status logic are exactly what’s in production.
Same information. Different container. One standup dropped from 25 minutes to 12. The questions changed from “wait, which ones are actually blocking?” to “who’s taking P0–1a?”

Beyond the PR: Every Document That Has to Survive a Room

notion image
The dev PR use case is where we started. It’s not where this ends.
Think about the last time you sat through a presentation built in PowerPoint. Someone had spent hours on slides, exported a PDF, shared it over email. Half the room opened it on their phones and couldn’t read the charts. The presenter spent the first three minutes explaining the color coding. Someone asked about a specific number and the answer was “I’ll have to check the spreadsheet.”
That’s the same problem. Spatial information delivered as a fixed object, forcing everyone in the room to translate before they can respond.
A quarterly review as a single HTML file looks different. The slides are still there — arrow-key navigation, no build step. But the charts are live. Each person hovers over the number they care about.
The regional breakdown is filterable. A response field at the bottom lets stakeholders flag a concern or submit a priority vote before leaving the page. That input comes back as structured data the next agent call can act on. The meeting doesn’t end with action items in someone’s notebook. It ends with a file.
This is what “sending a document” becomes when the format is HTML. A PDF is a fixed object. An HTML file is an environment. The person receiving it navigates it on their own terms — without matching your reading pace, your zoom level, or your familiarity with the data. And if you’ve built in the export button, what they do inside it comes back to you.
The pattern holds wherever AI-generated work has to cross a context boundary. A business proposal needing sign-off from people who weren’t in the original conversation. A research summary that has to land with someone from a different discipline. A vendor comparison that three stakeholders need to filter differently. In each case, the question is the same: does the format help the next person enter the work, or does it make them translate it first?

The Tradeoff, Honestly

notion image
Every format shift moves cost somewhere. HTML is no exception.
HTML costs more to generate. For a document the size of our action plan, the HTML version runs approximately 4,200 tokens. The equivalent Markdown is around 1,150. That's a 3.6× multiplier on output tokens.
At Claude Sonnet pricing as of May 2026, the delta is roughly $0.009 per document. At 100 documents per day, that's about $0.90 extra per day - real, but not significant for most teams. The more useful frame: one missed P0 finding costs more than a month of that overhead. The token cost is the price of a format that gets reviewed properly instead of skimmed and hoped at.
But token cost is only the first tradeoff.
Three rules keep the format honest.
  • Rule 1: HTML is for human review, not machine handoff.HTML is not a good input for agent-to-agent pipelines. Keep Markdown for source, diffs, archives, and machine context. HTML earns its cost only at the human review surface.
  • Rule 2: visual authority is not factual authority.A red P0 badge doesn’t mean the agent got the priority right. A well-structured HTML artifact can make a wrong AI output look more credible than a flat Markdown file would. Faster comprehension helps — it doesn’t replace judgment. The render layer should make review faster, not make the model more trusted.
  • Rule 3: interactivity needs a safety contract.A useful HTML artifact should be self-contained, with no remote scripts, so it works offline and doesn’t phone home. It should use text labels alongside color and hover states, so it still works for color-blind readers and keyboard-only users. And if it re-enters an agent loop, it should run sandboxed, without external network access.
None of these are hard to ask for in the prompt. They’re worth adding to your template.

The Prompt Change

notion image
You don’t need new tooling. One line at the end of your existing prompt.
Instead of: “Generate a PR action plan covering all open issues by priority.”
Try: “Generate a PR action plan covering all open issues by priority. Output as a standalone HTML file: color-coded priority badges (P0=red, P1=amber, P2=yellow), status indicators (OPEN/FIXED/PARTIAL), delta callouts for size changes, and a summary header with open and fixed counts.”
For code review, Thariq’s own prompt is worth borrowing directly:
“Help me review this PR by creating an HTML artifact that describes it. Render the actual diff with inline margin annotations, color-code findings by severity and whatever else might be needed to convey the concept well.”
The harder shift isn’t the prompt. It’s recognizing that the format of AI output is a design decision. We used Markdown for three years because that’s what the GPT-4 era trained us to expect. The models have moved. The context windows have moved. The use cases have moved.
The next bottleneck isn’t what the agent can generate. It’s whether the next person can enter it fast enough to act.

References

  1. Thariq Shihipar, “The Unreasonable Effectiveness of HTML” — original X post, May 8 2026
  1. Companion examples site — 20 self-contained HTML artifacts across 9 categories
  1. Simon Willison, link post and notes — May 8 2026
  1. Andrej Karpathy, reply to Thariq’s post — visual processing argument
  1. Ranganathan & Ye, “AI Doesn’t Reduce Work, It Intensifies It” — HBR, February 2026
 

Next Step

If your AI system works in demos but still feels fragile, start here.

Flamehaven reviews where AI systems overclaim, drift quietly, or remain operationally fragile under real conditions. Start with a direct technical conversation or review how the work is structured before you reach out.

Direct founder contact · Response within 1-2 business days

Share

Related Reading