Flamehaven LogoFlamehaven.space
back to writing
Prompt, Pray & Push: Why Your AI Agent Keeps Failing You

Prompt, Pray & Push: Why Your AI Agent Keeps Failing You

The one concept that turns expensive spaghetti into great agentic engineering.

notion image

On the term: In physics, resonance occurs when two systems synchronize frequencies and energy transfer maximizes. That’s the metaphor — and the mechanism.“Contextual Resonance” doesn’t exist in the academic literature yet. 🔺“Context Engineering” describes a technique. 🔺“Contextual Intelligence” is data-architecture speak. 🔺“Intent Alignment” lives in AI safety. None capture what I mean: the active, ongoing act of synchronizing your mental model with an agent’s working model — not as a one-time prompt, but as a persistent, calibrated relationship.If a better term emerges, use it. The concept is what matters.

1. The Shift No One Saw Coming

notion image
It is genuinely hard to communicate how much programming changed in the second half of 2025 — and specifically around December. Not gradually. The inflection was sharp.
By late 2025, coding agents had crossed a practical threshold.
Claude Code, GitHub Copilot’s agent mode, Cursor — these were no longer autocomplete tools.
They were systems that could hold coherent intent across long tasks, hit obstacles, research their way through them, and keep going.
The shift had been building across multiple rails: model quality, long-context coherence, and the Anthropic Model Context Protocol (Nov 25, 2024), which gave agents a standardized interface for tools and external systems. MCP wasn’t the cause of the jump — just one of the enabling rails that made tool-use standard and repeatable.
Ben Shoemaker captured the practical result in his February 12, 2026 essay In Defense of Not Reading the Code” — a piece that generated enough debate to still be cited in Hacker News threads three weeks later.
Here is what the new capability looks like in practice.
You type something like:
“Here is the server access (via sandboxed credential flow). Log in, set up SSH keys, install vLLM, download and benchmark a vision model, build a web dashboard for video inference, test everything, configure systemd, write me a report.”
You come back an hour later. The agent has hit three dependency conflicts, researched fixes, resolved them, written the code, debugged it, set up the services, and handed you a report.
That used to be a weekend project.
But here is where the story turns — because most people stop at “wow” and miss the harder lesson.

2. Prompt, Pray & Push

notion image
A new religion has quietly taken over software development.
Prompt, Pray & Push.
You type something vague.
Close your eyes.
git push.
Move on.
And when the result is broken — when the code compiles but misses the point, when the PR passes tests but breaks the unstated requirement, when the architecture has quietly rotted — you blame the model.
“The AI just isn’t good enough yet.”
The models are good enough. That is no longer the bottleneck.
We are becoming AI-lazy — not in the sense of resting, but in believing that input → magic → output is a complete workflow. We outsource execution without transferring intent. The result: codebases where AI-slop compounds silently.
Today’s vague prompt becomes tomorrow’s legacy system — one your next agent can’t reason about either. The cost doesn’t disappear. It accumulates.
This is not an AI capability problem. It is a communication problem. And the solution has existed in every functional organization since before computers were invented.

3. You Wouldn’t Do This to a New Hire

notion image
Imagine your first week at a new company.
Your manager drops a sticky note on your desk: “Build the dashboard.”
Then walks away.
No context.
No prior conversation.
No understanding of why this matters.
What do you build? Something.
Probably the wrong something.
Now imagine the same scenario — but first your manager sits with you for an hour. They walk you through the strategy — the why. They explain the specific users and constraints — the what. They describe the tech decisions that can’t be undone — the context.
Same person.
Dramatically different output.
Agents behave like high-agency contributors — capable, fast, and completely dependent on you to give them the why before they can execute the how. Without it, they interpolate. And their interpolation is technically coherent and contextually wrong.
This is the gap at the center of every AI-generated spaghetti codebase.

4. Contextual Resonance

notion image
Contextual Resonance is the active, ongoing process of making your intent structurally legible to an agent — before, during, and after task execution.
One sentence: It’s what separates an agent that builds what you imagined from one that builds what you described.
🔺To understand why “Resonance” and not the existing alternatives:
notion image
🔺The Contextual Resonance Stack — where each layer lives:
notion image
The failure zone is at the boundary between Why and What.
Most engineers hand off at the How level — and wonder why the output drifts.
🔺Does Contextual Resonance actually move the numbers?
The Research, Scalable MatMul-free Language Modeling maps to a consistent pattern across codebases:
notion image
A 12% cost increase for a 46-point accuracy jump. The ROI is not close.
The word is deliberate. Resonance, not instruction.
You are not telling the AI what to do.
You are engineering a state in which the agent’s working model of your goal synchronizes with your actual goal.
Closely enough that its autonomous decisions trend correct — rather than merely plausible. And this isn’t just a useful mental model. The research backs it up.

5. The Research Agrees — With a Catch

notion image
Three recent papers. Three findings that make the case better than any argument I could construct.
  • You can dramatically improve accuracy without changing the model.Same model, same task — the only variable was context. Specifically, providing full repository structure rather than just the open file. The results shifted dramatically. The bottleneck was never the AI. (On the Impacts of Contexts on Repository-Level Code Generation)
  • Your agent doesn’t know what it doesn’t know — and neither do you.An analysis of 2,303 real-world context files found that only 14.5% adequately specified security and performance requirements. Which means 85% of agents running in production today have never been told what “safe” or “fast enough” actually means for their codebase. That code is already in your repo. (An Empirical Study of Context Files for Agentic Coding)
  • But more context isn’t the answer either. This is the part most “just write better prompts” advice skips. Context files that were too long or poorly structured actually hurt — 19% cost increase, 4% improvement in success rate. Volume is not signal. Precision is. (Are Repository-Level Context Files Helpful for Coding Agents?)
Context is not a dump. It is a filter.
Throwing your entire repo at an agent isn’t Contextual Resonance. It’s noise with a better name.

6. What This Looks Like in Practice

notion image
Most writing on prompting stays safely abstract. Here’s the same task, twice.
❌ Prompt, Pray & Push
“Add user authentication to the app.”
The agent builds auth. JWT, because that’s what tutorials use. Tokens in localStorage, because that’s the path of least resistance. No refresh token rotation, because you didn’t mention it. The code passes every test you wrote. It ships.
Six months later, a security audit flags the localStorage vulnerability. The fix takes two weeks. The agent did exactly what you asked.
That was the problem.

⭕The Contextual Resonance version:
“We’re adding user authentication to a fintech app that handles PII. Our threat model prioritizes session hijacking prevention over UX convenience.
Required: httpOnly cookies, refresh token rotation, forced re-auth on sensitive operations. Hard line: no credentials or tokens in localStorage or sessionStorage — ever. We use [auth library X].
If you hit a trade-off between security and latency, stop and flag it. Don’t decide that yourself.”
Same agent. Same model. The agent chooses httpOnly cookies without being told why they matter. It implements rotation unprompted. It flags one edge case for human review — correctly.
The difference isn’t intelligence. It’s resonance. You transferred the why before the how, and the agent’s autonomous decisions tracked yours.

7. The Harness — Where Sympathy Becomes System

notion image
Contextual Resonance is a mindset. It needs infrastructure to scale. This is where Harness Engineering enters.
Shoemaker’s definition is precise: “By ‘I don’t read code,’ I mean I don’t do line-by-line review as my primary verification method.”
His alternative is systematic: 85%+ test coverage, progressive integration tests, benchmarking, production signals. The harness is the environment that catches what context alone cannot.
But here is the key insight, and it deserves emphasis:
A perfect harness around the wrong intent produces perfectly validated garbage.
Think of it this way: a high-speed rail line with flawless engineering that leads to a brick wall. The code is 100% covered, zero linter errors, zero vulnerabilities — and it implements a feature your users never asked for.
The harness catches execution errors.
Only Contextual Resonance catches intent errors.
You need both.

🔺The Minimal CONTEXT.md Template

🔺Context Anti-Patterns (What Breaks the Harness)
notion image

8. What You Can Do Differently, Starting Now

notion image
Five habits, in order of impact:
1. Write the Why first. Before the task prompt, write ten sentences about why it matters, who it affects, and what failure looks like. This is not overhead — it’s the actual work.
2. Structure as Why → What → How. Most prompts start with how. The agent can figure out the how. Only you know the why.
3. Build a living CONTEXT.md. Treat it as shared memory. Update it after every significant agent interaction. It should outlast any single session.
4. Define your harness before you deploy. Write the tests. Set the acceptance criteria. Let the agent run against them — not in hope, but in certainty that failure is detectable.
5. Treat failure as a context gap. When the agent produces something wrong, resist re-prompting with more instructions. Ask: what context was missing? Fix CONTEXT.md first.

9. The Thing That Has Always Been True

notion image
Here is what is strange about this moment.
We are marveling at agents that can independently research, code, debug, and deploy. And the lesson we keep learning — over and over — is not about the technology.
It is about communication.
No matter how capable the agent.
No matter how sophisticated the harness.
If the intent was never clearly transferred, the output will drift.
The gap between what you imagined and what shipped will widen,
one pull request at a time.
The best team in the world, with the best tools, in the best environment — will still fail if they never had a real conversation about what they were actually trying to accomplish.
AI is not an exception to this rule.
It is the most literal demonstration of it.
The bottleneck has never been intelligence — human or artificial. It has always been the quality of the conversation that precedes the work.
Contextual Resonance is not a technique for working with AI. It is a technique for working with anyone who cannot read your mind.
Which, it turns out, is everyone.
The agents have been ready for weeks. The only remaining question is: are you ready to lead them with Contextual Resonance?
notion image
 

Share

Related Reading