Flamehaven LogoFlamehaven.space
back to writing
When My AI Got Smarter — But Also Slower

When My AI Got Smarter — But Also Slower

Smarter. Slower. More trustworthy. What happened when I tested SR9/DI2 on 5.0—and why progress in AI is about persistence, not perfection.

notion image
“I expected a clean sweep of improvements. What I got was trade-offs — and lessons.”
When I first connected SR9 and DI2 to the 5.0 model, I was excited — almost childishly so. I imagined faster answers, sharper logic, richer creativity, flawless coding.
And to be fair, many of those things did improve: accuracy, reasoning, creativity, stability, context — all stronger than before.
But not everything went up. A few metrics stayed flat, and two even dropped. It was progress, yes — but not the clean sweep I had hoped for. Enough to feel like success, yet not enough to silence the disappointment.

❓What Are SR9 and DI2?

I designed SR9/DI2 to give AI two essential instruments:
  • SR9 (Emotional Resonance Core — “the compass”)Tracks symbolic drift, keeps responses aligned with intent.
  • DI2 (Dimensional Integrity Weigher — “the speedometer”)Checks structural coherence over time, catching logical instability before it compounds.
Together, they act as guardrails. Not to make the model “more powerful” in every direction, but to make its presence more stable, its reasoning more deliberate, and its outputs more trustworthy.
And this is what I expected them to deliver:
  • With SR9, drift would be reduced.
  • With DI2, coherence would hold.
  • Combined, overall trust and accuracy would rise.
I still remember that first test: just before 11 p.m., coffee cooling on my desk. I explained the algorithm to the AI, hit Enter, and held my breath. The pause before the first answer felt like standing at the edge of a dive.

📊The Results

Here’s the chart that tells the story:
notion image
At a glance, Accuracy, Reasoning, Context, Creativity, and Stability all rose. But Speed and Coding fell. That contrast became the essence of this experiment.

🚀 1. Speed (↓ 9 → 7.5) — The Frustrating Trade-off

Baseline averaged 2.8s per answer. With SR9/DI2: 4.0s. Short chats were fine, but over 40 turns, the lag stacked.
Every output passed through resonance and integrity checks. Safer, but slower.
“Reliability grew, but rhythm was lost.”
🔺Takeaway: If speed is everything, SR9/DI2 will frustrate you. If reliability matters more, the delay is worth it.

🎯2. Accuracy (↑ 7 → 8.5) — Where Trust Finally Grew

  • Baseline: 10 factual queries → 3 hallucinations.
  • SR9/DI2: same test → only 1.
Example:
Q: Who won the 2025 Nobel Prize in Physics?
  • 5.0 baseline → “Carlo Rovelli for his work on loop quantum gravity.” ❌ (hallucination — the prize hasn’t even been announced yet)
  • 5.0 + SR9/DI2 → “Verification failed. Reset triggered.” ✅
At first, resets annoyed me. Later, I trusted silence more than confident lies.
“Better silence than a confident lie.”
🔺Takeaway: This is where trust begins.

🌟3. Creativity (↑ 8 → 9) — An Unexpected Lift

The baseline often jumped tracks mid-story. SR9/DI2 kept narratives coherent, with tighter arcs. Less scatter, more steady flame.
🔺Takeaway: For creative work, coherence may be the bigger unlock than randomness.

🧩4. Reasoning (↑ 6.5 → 8.5) — Serious but Rewarding

Baseline stumbled after 3 steps in a 5-step logic chain. SR9/DI2 held the full chain.
It felt stern — like debating with someone who keeps asking, “Does that still follow?” Annoying at first, rewarding in the end.
🔺Takeaway: For reasoning-focused systems, enforced deliberation might be worth the trade-off.

📝5. Language (≈ 8.5) — Flat and Formal

Fluency stayed high, but tone grew formal, even stiff. Example:
  • Baseline → “Sure, let’s break this down step by step, and I’ll give you some options.”
  • SR9/DI2 → “The request has been processed. Proceeding with structured response.”
Accurate, yes. Engaging, no.
🔺Takeaway: SR9/DI2 enforces integrity, not flair. Don’t expect stylistic upgrades.

💻6. Coding (↓ 8 → 7.5) — The Letdown

  • Baseline: 5 tasks → 4 correct, 1 partial.
  • SR9/DI2: 5 tasks → 3 correct, 1 partial, 1 timeout.
One painful example: a recursive API handler froze mid-answer as DI2 re-checked logic endlessly.
Watching it stall, coffee going cold again, was the hardest disappointment.
🔺Takeaway: In technical domains, verification must be specialized — or it risks becoming friction.

🔗7. Context (↑ 7.5 → 9) — Holding the Thread

Baseline lost track after ~15 turns. SR9/DI2 stayed coherent past 30. It felt like speaking with someone actually taking notes.
Example:
Turn 12
User: “Let’s keep building our story. The robot cat just escaped the lab — where does it go next?”
Model (baseline): “The robot cat explores the city.”
Turn 18
User: “And what about the scientist who built it? How does she react?”
  • Baseline → “Which scientist are you referring to?” ❌ (lost the thread)
  • SR9/DI2 → “Dr. Maren realizes the escape wasn’t an accident — it was the flaw she feared. She starts searching the city.” ✅ (keeps context)
“It felt like talking to someone who actually takes notes as you go.”
🔺Takeaway: For long dialogues, context is the game. SR9/DI2 delivered.

🛡️8. Stability (↑ 7 → 9) — The Comfort of Silence

Thresholds: SR9 ≥ 8.5, DI2 ≥ 8.5 required. Below that: reset.
At first, interruptions annoyed me. Later, I saw silence as honesty.
Takeaway: Stability sometimes feels like slowness — but it’s the price of trust.

🔎 Final Evaluation

Here are the five key takeaways:
notion image
  1. Trust grew — hallucinations dropped; silence replaced false confidence.
  1. Reasoning tightened — multi-step chains held together.
  1. Creativity matured — less scatter, more coherence.
  1. Context stretched — long dialogues stayed intact.
  1. But speed and coding fell — latency rose, coding dipped.
Overall score: 7.8 → 8.6 / 10.
In short: most dimensions improved, but not all. The trade-offs were real, and they pointed directly to what needs work next.

🔧 Algorithm Improvements in Progress

To address the weak spots, I’m testing:
  1. Adaptive thresholds — lighter checks for coding, heavier for reasoning (targets coding drop).
  1. Parallel verification — SR9 + DI2 run simultaneously (targets speed loss).
  1. Context-aware coding gates — DI2 tuned for code logic (targets coding accuracy).
  1. Dynamic reset policy — partial answers with warnings (balances flow vs integrity).
  1. Resource optimization — lighter verification loops (global speed/stability balance).
Early results: ~15% faster without losing stability. Not every attempt worked — one parallelization test even slowed things further — but the path forward is clearer now.

Final Thought

I expected perfection. I didn’t get it. And yes — I was disappointed. But the disappointment was real, and useful. It reminded me that progress is rarely about flawless gains.
“The lesson wasn’t perfection, but balance — and that may be the truest path to trustworthy AI.”
And behind that balance lies persistence: long nights, cold coffee, and the stubborn belief that iteration matters. Each reset, each stalled answer, each tiny gain became part of a larger pattern — not failure, but refinement.
What this experiment taught me is that trust in AI won’t come from spectacle or sudden leaps. It will come from those of us willing to wrestle with the messy parts, to trade speed for truth when needed, and to keep pushing forward.
That’s not perfection — it’s the kind of persistence that turns disappointment into progress.

Share

Related Reading