
Each /slop Is a Calibration Signal — AI-SLOP Detector v3.6.0 and the Claude Code Skill
Every /slop invocation records to a project-scoped history. After 10 re-scanned files, bounded self-calibration adjusts detection weights for your codebase. Here is the mechanism, the data, and what actually shipped in v3.6.0.

AI-assisted development has a quiet failure mode: the assistant that creates the pattern often becomes the assistant that reviews it.
When you and Claude work inside the same session, you drift together. The review criteria shift with the assistant's habits. After enough sessions, the same assistant that wrote the hollow function body is also the one approving the pull request. There is no external reference point — unless you build one.

That is the problem AI-SLOP Detector v3.6.0 addresses with the Claude Code skill.
Every time you run
/slop inside a session, the scan result is recorded to a project-scoped history. When enough re-scan evidence accumulates, bounded self-calibration adjusts the detection weights for your codebase — automatically, without a manual command. The scanner does not drift with the session. It stays anchored to observed scan outcomes.It does not get smarter every time. It builds calibration signal every time. That is a more accurate claim, and the distinction matters.
What the Skill Does

Install:
Four slash commands become available:
Command | What it does |
/slop | Full project scan — interprets findings, prioritizes fixes, proposes patch plan |
/slop-file [path] | Per-file deep-dive — explains each metric, gives concrete fix per pattern |
/slop-gate | Hard gate decision — PASS or FAIL, lists blocking files with deficit_score >= 70 |
/slop-spar | Adversarial validation — probes metric boundaries, catches calibration drift |
The intended workflow inside a Claude session:
Quality policy lives in the skill layer. You do not re-explain what
CRITICAL_DEFICIT means or which patterns are critical on every session.The LEDA Flywheel

This is the part that matters.
LEDA is not model retraining. It is bounded weight calibration based on repeated scan outcomes.
/slop runs slop-detector --project . --json — without --no-history. Every invocation auto-records results to ~/.slop-detector/history.db, tagged with a project_id (sha256 of cwd) so signals never mix across different repositories.After every 10 re-scanned files, the tool runs the LEDA self-calibration loop automatically:
The calibrator uses re-scanned files as signal — not raw record count. A file counts toward the milestone only when the tool has seen it improve or degrade across at least two runs. This prevents first-time project scans from triggering calibration on noise.

Three constraints keep calibration bounded:
- Domain-anchored — grid search is constrained to ±0.15 around domain baseline weights. Detection cannot drift outside the meaningful range for your project type.
- Confidence gate — only applies when the top candidate weight set beats the second by > 0.10. Ambiguous signals produce no change.
- Drift warnings —
CalibrationResult.warningsflags any dimension that shifted > 0.25 from the anchor.
/slop-spar adds a separate adversarial layer: it probes known-pattern anchors, metric boundary cases, and existence conditions. When it detects that measured behavior has diverged from metric claims, it recommends --self-calibrate --apply-calibration explicitly.What the Data Shows — and What We Won't Claim

We will not tell you that AI-SLOP Detector improves code quality by X%.
We have not run a controlled study. We have not compared matched projects with and without the tool. Any number we put here would be a claim we cannot prove, and this tool is built specifically to catch that kind of thing.
What we do have: the tool scanning itself. Every time a core module was changed, it got re-scanned. N = 14,367 records across all projects in
~/.slop-detector/history.db.This is not outcome evidence. It is workflow telemetry. Here is what the scan history shows for the eight most-improved files in this codebase:
And the weekly project aggregate (avg deficit score):
The mechanism is not mysterious. Scan reveals structural problems → Claude sees exact pattern names and line references → Claude (or the developer) fixes them → rescan confirms improvement → LEDA registers the delta and adjusts detection weights accordingly.
The loop does not guarantee quality. It makes quality visible, then measurable, then improvable.
Whether that loop improves your codebase is something your
history.db will tell you — not us.Also in v3.6.0

CI gate exit code fix.
--ci-mode hard without --ci-report was returning exit 0 even on CRITICAL_DEFICIT files — a two-line fix in _evaluate_ci_gate() (commit 0d67997). This affected v3.1.1 through v3.5.0 on the specific path of using the gate without the reporting flag. A regression test at the subprocess level was added to prevent recurrence (commit 0208af4).Pre-commit hooks rewritten. Three hook variants now use
python -m slop_detector.cli as entry point (bypasses Windows .exe wrapper exit-code issue), and --severity high (nonexistent flag) replaced with --ci-mode:VS Code Extension v3.6.0. Version tracks core library. No behavior changes from v3.5.0.
The Shape of the Loop

The skill + LEDA loop is the external reference point. Detection weights stay grounded in observed scan outcomes — files that improved across re-scans, files that stayed problematic — rather than in what the assistant believes is correct at any given moment.
The loop does not guarantee quality. It makes quality visible, then measurable, then improvable.
We won't tell you what percentage your code will improve. That would make us the thing we are trying to detect.
The scanner is not Claude's opinion about code quality. It is a measurement that gets calibrated against reality, session by session. Your
history.db will tell you the rest.
Links:
Next Step
If your AI system works in demos but still feels fragile, start here.
Flamehaven reviews where AI systems overclaim, drift quietly, or remain operationally fragile under real conditions. Start with a direct technical conversation or review how the work is structured before you reach out.
Direct founder contact · Response within 1-2 business days
Share
Related Reading
Reasoning / Verification Engines
Can AI Review Physics? Yes — That Is Why We Built SPAR
Reasoning / Verification Engines
I Built an Ecosystem of 46 AI-Assisted Repos. Then I Realized It Might Be Eating Itself.
Reasoning / Verification Engines