Flamehaven LogoFlamehaven.space
back to writing
RExSyn Nexus 0.6.1 - Stop Hallucinating Proteins: How We Built a 7D Reasoning Engine with AlphaFold3

RExSyn Nexus 0.6.1 - Stop Hallucinating Proteins: How We Built a 7D Reasoning Engine with AlphaFold3

RExSyn Nexus 0.6.1 adds Structure as a 7th reasoning dimension, using AlphaFold3 confidence signals to reject biologically plausible but physically impossible protein hypotheses with deterministic, auditable validation.

Series

RExSyn Nexus-BioPart 1 of 10
View all in series
Cover image for RExSyn Nexus 0.6.1 - Stop Hallucinating Proteins: How We Built a 7D Reasoning Engine with AlphaFold3

Before we talk about “BioAI agents,” we need to admit the real failure mode:

LLMs are great at producing hypotheses that sound scientific — and quietly violate physics.
In this post, I’ll show:
  • why “plausible trash” happens in biomedical reasoning,
  • how RExSyn Nexus v0.6.1 adds Structure as a first-class reasoning dimension,
  • and how to run it (API + Python), with deterministic CI testing.

The logic–physics gap: a semantic hypothesis can look valid but fail physical reality

The problem nobody likes to admit: “Plausible Trash”

In autonomous biomedical research, LLMs can write hypotheses that sound like science.
“Attach a 50kDa PEG chain to the binding pocket of Protein X to improve solubility.”
Semantically? Great.
Structurally? Often impossible.
And this is the failure mode I care about:

When a system is “logically convincing” but physically wrong.

So my question became:
Can we make the pipeline refuse a hypothesis — and explain exactly why — using physics-derived signals?
That’s what RExSyn Nexus v0.6.1 is about: adding the 7th dimension to our reasoning model.
We moved from M-A-D-I-F-P to M-A-D-I-F-P-S, where S = Structure and it’s computed from AlphaFold3 confidence outputs.

What M-A-D-I-F-P-S actually means

M-A-D-I-F-P-S is not a “magic acronym.”
It’s a reasoning checklist — seven lenses the engine uses to decide whether a hypothesis deserves to survive.
  • M — Methodic: Did we follow a disciplined procedure (inputs, constraints, reproducible steps), or are we hand-waving?
  • A — Abductive: What is the best explanation that fits the evidence we have right now? (plausible hypothesis generation)
  • D — Deductive: If the hypothesis is true, what must be true next? (logical consequences; consistency checks)
  • I — Inductive: Does it generalize from prior cases / datasets / known patterns, or is it a one-off story?
  • F — Falsification: What would disprove this quickly? (designing refutation tests; “how can this fail?”)
  • P — Paradigm: Is it compatible with established domain constraints and assumptions (biology, chemistry, protocols), or does it violate the frame?
  • S — Structure (new in v0.6.1): Even if it’s semantically convincing, is it physically viable? In v0.6.1, S is computed from AlphaFold3 confidence signals (e.g., clash/disorder/PAE-style uncertainty) and can veto a hypothesis.
In short: 6D (M-A-D-I-F-P) prevents “logical nonsense.”7D (+S) prevents “physically impossible but linguistically plausible” ideas.

RExSyn Nexus v0.6.1 — from text generation to biological simulation (7D reasoning: M-A-D-I-F-P-S)

What RExSyn Nexus is (in one minute)

RExSyn Nexus is a pipeline that combines:
  • a LOGOS reasoning core (IRF-Calc + AATS),
  • semantic anchoring (embeddings),
  • structural confidence (AlphaFold3 confidence schema),
  • and scientific validators (PoseBusters / DockQ / SAXS), exposed as an API + job workflow.
LOGOS itself is defined as IRF-Calc’s 6D framework + AATS v2.1, plus bridge components like drift control and calibration gates.

Why v0.6.1 matters (the “why”, not the “what”)

Most stacks do this:
  1. Generate hypothesis
  1. Add citations
  1. Ship “confidence” as a vibe
We wanted this instead:
  1. Generate hypothesis
  1. Ask physics if it’s plausible
  1. If physics says “no,” reject — and log the reason
That’s why v0.6.1 is not a “feature party.”
It’s a structural hardening release: determinism, traceability, schema rigor, and compliance baked into runtime.
This release also formalizes the Sovereign Adapter Pattern with three constraints:
  • Drift-free execution (same input → same output in deterministic modes)
  • License sovereignty (explicit acknowledgment gates for restricted terms)
  • Schema rigor (no fuzzy JSON)

7D reasoning (M-A-D-I-F-P → +S): hypotheses must be logically sound and structurally viable

The mental model: 6D vs 7D

Think of it like this:
  • 6D reasoning answers: “Is this coherent?”
  • 7D reasoning adds: “Would atoms allow it?”
So the engine can finally say:
“Your argument is deductively strong…
but your structure score is low (clash/disorder).
Therefore: reject.”

How it works (end-to-end)

Step 1) Mirror AF3 confidence outputs into a strict schema

Why? Because ad-hoc dicts are where “silent drift” hides.
We mirror AlphaFold3 confidence outputs with strict types (including NumPy arrays for heavy matrices).
The key detail: we enforce npt.NDArray[np.float64] because loose typing can create “slop”—ambiguous pipelines that look correct but hide drift and cost.

Step 2) Convert confidence metrics into a belief score (0..1)

Structure scorer normalization — mapping confidence metrics into a single belief score (0..1

Raw AF3 outputs are not “reasoning scores.”
They’re confidence/error signals.
So we normalize them into Structure (S): a belief score the reasoning engine can weigh against semantic arguments.
Here’s a minimal, explainable scorer (weights are heuristics; we’ll tune them later via validation feedback):
The “aha” is not the formula.
The “aha” is: Structure becomes a first-class reasoning dimension, not a post-hoc chart.

Step 3) The Guard: License Compliance as Code

LicenseGuard — compliance enforced as runtime behavior (explicit acknowledgment gate + audit logging)

When you integrate restricted research assets, “compliance” can’t be a PDF someone forgets.
So we enforce an explicit acknowledgment gate at runtime:
This is “compliance by design.”
It prevents silent legal drift in real teams.

Step 4) Test it without GPUs (deterministic mock)

A reproducibility diagram showing two runs producing identical distributions (deterministic mock mode), enabling drift-free CI testing without GPU overhead.

You can’t run structural inference on every CI run.
But random mocks create flaky tests.
So we use a deterministic mock adapter: same input sequence → same output artifact, every time.

“Okay—but how do I use it?”

Here are two practical entry points.

Option A) Use the API (job workflow)

  • POST /api/v1/predict
  • GET /api/v1/jobs/{job_id}
  • GET /api/v1/jobs/{job_id}/result
Mental model:
  1. submit a prediction job
  1. poll status
  1. retrieve result + scores + artifacts
Here’s the fastest way to feel the pipeline:
If you see REJECT with has_clash=true or high disorder, that’s the point:
semantic plausibility didn’t pass the laws of physics.

Option B) Use the LOGOS service in Python (reasoning workflow)

The point: you’re not just getting a structure score — you’re getting a reasoned decision object.

What makes this “special” (not just another pipeline)

1) Structure is not decoration; it’s a reasoning axis

S is computed, normalized, and used for accept/reject decisions.

2) Schema is treated like governance

Strict types prevent “silent drift” and “slop” patterns.

3) Trust claims are tied to verification artifacts

Instead of “trust me,” we ship reproducibility hooks (schema parity + deterministic mocks + validation surfaces).
(If you hate symbolic scoring: same. The point isn’t the symbol. The point is: verification is a first-class shipping artifact.)

What I’m improving next (what to watch)

  • Adaptive weighting: replace heuristic constants with calibration from PoseBusters/DockQ/SAXS feedback loops
  • Cascade reasoning: early-exit 7D when 6D already fails
  • Multi-chain interface focus: score binding-interface regions, not only global matrices

Closing

Most systems can generate biomedical text.
Few systems can say:
“This is coherent, but physically invalid — here’s why.”
RExSyn Nexus v0.6.1 is my attempt to build that kind of refusal — auditable, deterministic, and grounded.

Share

Continue the series

View all in series

Related Reading