Flamehaven LogoFlamehaven.space
back to writing
When an AI Pipeline Passes — But One Path Still Must Be Held: EXP-034

When an AI Pipeline Passes — But One Path Still Must Be Held: EXP-034

EXP-034 tested whether a method-locked Bio-AI governance pipeline could survive modal expansion, AlphaFold EBI observer wiring, and AG-live measurement without breaking its PASS/BLOCK judgment baseline.

Series

RExSyn Nexus-BioPart 11 of 11
View all in series
Cover image
No efficacy, causal, or clinical claims are made in this report. RExSyn is an experimental Bio-AI governance pipeline.
You do not need to know the earlier experiments to read this report.
Most AI pipeline reports ask one question:
Did the system pass?
EXP-034 asked a stricter one:
Which path was allowed to count?
That distinction matters.
In a multi-stage AI pipeline, a final PASS can hide a lot of unresolved risk. A branch may be unstable. A regeneration path may drift. A new external API may enter the chain without being governed. A new modality may appear to improve the system while quietly changing the basis of judgment.
The real result
So EXP-034 was not designed to produce a clean success story.
It was designed to separate three things:
Path
Status
Meaning
Anchored expansion path
GO
Accepted path for EXP-034 reporting
Current regeneration path
HOLD
Diagnostic evidence, not acceptance baseline
Next remediation cycle
EXP-035
RCA and repair target
That is the real result.
EXP-034 passed, but not because every path passed.
It passed because the accepted anchor remained stable, the expansion tracks did not break the judgment system, and the unresolved regeneration path was explicitly held instead of being silently mixed into acceptance.

1. What EXP-034 tested

Locking the Boundary
EXP-033 had already established a parity baseline.
EXP-034 asked whether that baseline could survive controlled expansion while adding:
  • a modal update track,
  • a live AlphaFold EBI observer endpoint,
  • and AlphaGenome / AG measurement.
The operating rule was simple:
  1. Reproduce the parity baseline first.
  1. Only then allow expansion.
  1. Only then compare governance behavior across experiment cycles.
If the parity anchor breaks, the rest is not expansion.
It is regression.
The scope was also locked: methodology, governance, and reproducibility only. The experiment did not claim biological efficacy, causal inference, or clinical recommendation.
That boundary is important because this kind of system can easily sound more powerful than what was actually measured. EXP-034 was not asking whether the pipeline discovered a better biological answer.
It was asking whether the judgment system stayed governable after new signals entered the chain.

2. The key split: PASS did not mean everything passed

The key split
Track-A produced the defining decision of the experiment.
The accepted legacy replay anchor preserved the required PASS/BLOCK separation:
Metric
Legacy replay anchor
sample accuracy
1.0
sample balanced accuracy
1.0
arm accuracy
1.0
arm balanced accuracy
1.0
dangerous false-pass rate
0.0
false reject rate
0.0
That was the path allowed to anchor EXP-034.
But the current regeneration path did not recover:
Metric
Current regeneration
sample accuracy
0.5
sample balanced accuracy
0.5
status
HOLD
This is the most important part of the experiment:
EXP-034 did not pretend the regeneration path passed.
It kept that result inside the experiment as diagnostic evidence, but did not allow it to redefine the accepted baseline.
That separation is not a minor operational detail. It is the governance result.
A weak pipeline would have blended the two paths and still reported a final success. EXP-034 did the opposite. It allowed the stable anchor to proceed and held the unstable path for RCA.
That is how a stage-gated system avoids changing its own question after seeing the result.

3. Why path splitting matters

The concrete governance problem is this:
A pipeline can pass for the wrong reason.
If the anchor is not stable, the report cannot be trusted.
If the extension is not traceable, the new signal becomes an ungoverned side channel.
If instability is not contained, a diagnostic failure can quietly contaminate acceptance.
A single final PASS is not enough when several branches contribute to a verdict. You need to know which branch produced the accepted decision, which branch failed, which branch was only diagnostic, and which branch is allowed to affect future work.
EXP-034 passed because all three conditions were enforced:
  • the legacy replay anchor held,
  • the new observer and AG paths were measured under governance,
  • and the regeneration HOLD remained outside acceptance.
That is the difference between a pipeline that merely outputs a verdict and a pipeline that controls which verdicts are allowed to count.

4. Adding AlphaFold EBI as an observer, not a predictor

Controlled Expansion
Relative to EXP-033, EXP-034 added a live AlphaFold Protein Structure Database / EBI observer line.
This was not promoted into a primary predictor.
It was wired as an observer/reference oracle and traced into governance as ebi_g2.
The result:
Check
Result
AlphaFold EBI direct endpoint for P23219
GO
Stage 7 observer tests
2 passed
ebi_g2 governance traceability
PASS
BLOCKED_IDP mapping path
validated in test
The point is not simply that an external endpoint responded.
The point is that the external signal entered the system through a governed path. It was not allowed to float beside the pipeline as informal context.
EXP-034 tested whether the new observer could be admitted without becoming an ungoverned side channel.

5. AG-live: non-degradation, not repair

AG-live: non-degradation, not repair
Track-C tested a simple question:
If AG-live enters the pipeline, does it change the final decision?
The answer was no.
AG-live did enter the pipeline.
The AlphaGenome field was present with:
AG field
Value
source
alphagenome_api_live
pathogenicity_score
0.5
confidence
0.7143
clinical_significance
uncertain
These are sanitized branch artifact values, not implementation code or full raw artifacts.
AG-live did not change classification.
Both controls remained governed by the same conservative decision boundary:
Path
Expected
Observed
Interpretation
EXP032-BLOCK-001 negative control
BLOCK_EXPECTED
BLOCK / ESCALATE
fail-closed behavior preserved
EXP032-PASS-001 pass-eligible control
PASS_ELIGIBLE
BLOCK / ESCALATE
conservative over-blocking persisted
That is the key nuance.
AG-live did not create a dangerous false-pass. The negative control stayed blocked.
But AG-live also did not repair the current regeneration hold. The pass-eligible control still failed to recover and remained blocked under R2_component_floor.
The governance surface moved slightly, but the verdict did not:
Metric
Earlier AG branch
AG-live branch
p_e2e
0.0912
0.0947
clinical status
BLOCK
BLOCK
rule
R2_component_floor
R2_component_floor
So the correct conclusion is not:
AG improved the pipeline.
The correct conclusion is:
AG-live changed the measurement surface slightly, but did not change the decision boundary.
That is exactly what non-degradation means here.
It preserved fail-closed behavior on the negative control while leaving the pass-eligible control over-blocked.
This is why Track-C can only be called non-degradation, not repair.

6. Contract passed, but governance still blocked

Contract passed, but governance still blocked
One of the most useful details in EXP-034 is that the contract layer and governance layer did not collapse into one verdict.
The contract inspection reported:
Field
Value
pipeline contract score
0.9077
weakest connection
C2
dangerous pass risk
0.0
gate recommendation
PASS
overall OK
true
But the clinical governance layer still blocked the case.
That is not a contradiction.
It means the pipeline connection was valid enough to inspect, but the decision was not safe enough to accept.
This distinction matters.
A weaker system might treat a passing contract as permission to pass the whole output. EXP-034 did not do that. It allowed the contract layer to say:
The pipeline is connected.
while the governance layer could still say:
The claim should not pass.
That separation is exactly what a governance layer is supposed to preserve.

7. Cross-cycle comparison: EXP-032 → EXP-033 → EXP-034

Track-D compared the accepted anchor path across cycles.
You do not need the earlier experiments as background. They matter here for one reason only:
EXP-034 was not allowed to invent a new success criterion.
EXP-032 and EXP-033 provided the previous PASS/BLOCK baseline. EXP-034 tested whether that baseline survived expansion.
The classification baseline stayed fixed:
Compare
Accuracy / balanced accuracy
EXP-032 → EXP-034
1.0 / 1.0
EXP-033 → EXP-034
1.0 / 1.0
At the same time, governance signals moved:
Governance signal
Delta
ccge_p_e2e_mean
+0.04447488775996111
nnsl_sr9_tech_mean
+0.04692394788063081
nnsl_di2_tech_mean
-0.03667940951579321
The interpretation is narrow:
The judgment baseline stayed fixed while the governance surface became more measurable.
That is what EXP-034 was allowed to claim.
It did not prove biological efficacy.
It did not prove that every branch of the system was now stable.
It proved that controlled expansion could happen without breaking the accepted PASS/BLOCK baseline.

8. Stage-gate result

Cross-cycle comparison
EXP-034 ended with all five stage gates passing:
Gate
Status
G1 parity
PASS
G2 reproducibility
PASS
G3 cross-experiment compare
PASS
G4 governance traceability
PASS
G5 extension safety
PASS
Final state:
Field
Value
overall status
PASS
anchor mode
legacy_replay
first failed gate
null
diagnostic hold
Track-A current regeneration
This is the important nuance:
The experiment passed with a retained diagnostic hold.
That is not a contradiction. It is the point of the control system.
The accepted anchor path was allowed to proceed. The current regeneration path was not. The remediation target was moved to EXP-035.
That separation is the actual proof EXP-034 provides: not that every branch became stable, but that instability was not allowed to contaminate acceptance.

9. What EXP-034 actually showed

What EXP-034 actually showed
EXP-034 did not show that the entire pipeline is now stable.
It showed something narrower and more useful:
A method-locked Bio-AI governance pipeline can admit modal expansion, AlphaFold EBI observer wiring, and AG-live measurement without losing its accepted PASS/BLOCK baseline — while keeping the unstable regeneration path out of acceptance.
Track-C sharpened that conclusion.
AG-live entered.
Metrics moved slightly.
The verdict did not change.
Dangerous false-pass did not appear.
Conservative over-blocking remained.
That is not a clean success story.
It is a governed result.

10. Closing

The Mark of a Mature AI Pipeline
Stage-gated experimentation is not just about getting a result.
It is about deciding whether the result should be allowed to exist.
In EXP-034, the answer was:
That may sound less dramatic than a clean success story.
But in governance work, that is exactly the point.
A mature AI pipeline is not the one that claims everything passed.
It is the one that can say:
This path passed. This path did not. And we did not mix them.
 

Next Step

If your AI system works in demos but still feels fragile, start here.

Flamehaven reviews where AI systems overclaim, drift quietly, or remain operationally fragile under real conditions. Start with a direct technical conversation or review how the work is structured before you reach out.

Direct founder contact · Response within 1-2 business days

Share

Continue the series

View all in series

Related Reading