
Stanford. Princeton. A bioRxiv Paper. So Why Did Nobody Ask Where the Data Goes?
BioClaw processes EHR data. Its primary showcase channel is WhatsApp. We audited the repository: 60/100, Tier 2 Caution. Here is what the bioRxiv paper says that the README does not.
Series
STEM_BIO_AI Audit ReportPart 3 of 3

When a repository carries this kind of pedigree, people stop asking questions.

Runchuan-BU/BioClaw (1) — a biomedical AI assistant published on bioRxiv on April 14, 2026 by researchers from University of Michigan, Shanghai Jiao Tong University, Chinese Academy of Medical Sciences, Peking Union Medical College, and University of Illinois Urbana-Champaign (2).
The foundational STELLA research behind it carries Stanford and Princeton correspondence addresses. 374 stars. 31 biomedical tools. 95+ skills. Deployable across eight messaging platforms. On the surface, every signal points to maturity.
One orientation note before we go further.
- BioClaw is the biomedical repository under review. It is built on a NanoClaw container-based agent architecture and distributed via OpenClaw as its primary install and runtime path.
- STEM-BIO-AI is our external audit scanner — a deterministic local CLI tool that evaluates repository evidence surfaces without executing the code or making network requests.
These are three separate layers. The risk profile of each matters.
This is exactly the kind of tool a research institution can install after checking the science, the README, and the repository — while still missing the deployment risk that matters. The credentials do the convincing. The bioRxiv paper frames the science. The repository shows the capability. But nobody reads the architecture closely enough to ask the one question that matters.
We asked that question.
The answer — and the thesis of everything that follows — is this: the exposure happens at the channel, not at the compute.
- The model can be sophisticated.
- The containers can be clean.
- The bioinformatics toolkit can be comprehensive.
- None of that changes what happens to the data before it reaches any of those components.
- The breach opportunity is at ingress.
And for BioClaw, the highest-risk showcased intake path is WhatsApp.
The Line That Stopped Us
From the bioRxiv paper, describing BioClaw's supported data modalities (2):
"The application of BioClaw spans various biomedical domains (e.g., genomics, clinics, structural biology) and data modalities (e.g., sequencing data, EHR data, protein structure data)."
EHR. Electronic Health Records.
The paper explicitly names EHR data as a supported modality. BioClaw's README foregrounds WhatsApp as its primary showcase channel — the platform featured in the project description, the Quick Start guide, and the primary usage examples (1). Feishu, WeCom, Discord, Slack, WeChat, QQ, and a local web interface are also supported.
The critique here is not "BioClaw equals WhatsApp." The critique is that BioClaw's WhatsApp-first workflow is the highest-risk path for sensitive data — and it is the path the repository presents first.
EHR data. Through a consumer chat channel. In group threads.
This is not a minor documentation issue. It is a compliance collision. To understand why, you need to look at the actual data path — not the abstract, not the paper's framing, but what happens at the moment a researcher hits send.
What the Data Path Shows
The README opens with this (1):
"BioClaw brings the power of computational biology directly into WhatsApp group chats."
Researchers message the BioClaw bot in a WhatsApp group. They upload files — sequencing data, gel images, CSV exports, protein structures. The system processes them in Docker containers and delivers results back to the same chat thread. That architecture is the product. The risk appears when that same workflow is placed beside the paper's claim that EHR data is a supported input modality.
Here is what that means in practice, separated by evidence type.
Directly confirmed by repository and paper:
- EHR data, sequencing files, and protein structure data are named as supported modalities in the bioRxiv abstract (2).
- WhatsApp is the primary showcased channel in the README and Quick Start (1).
- No PII or PHI scrubbing layer is visible in the repository between the consumer channel and the containerized agent runner.
- No audit-log surface for data ingress is detected in the codebase — who uploaded which file, from which device, at what time is not captured.
Regulatory inference based on HHS guidance:
- HHS guidance states that covered entities handling ePHI through cloud services or third-party platforms require a signed Business Associate Agreement with those services (3). WhatsApp cannot execute a HIPAA BAA under any configuration. For any covered-entity deployment involving PHI or EHR data, this creates a HIPAA compliance gap regardless of how the downstream model handles the data.
- Once an EHR file enters a group chat, it can replicate into unmanaged personal cloud backups, device storage, and participant-controlled environments that the institution cannot audit. The chain of custody is not merely weakened. It is lost to the deploying institution at the moment of upload.
Platform-level risk from the OpenClaw runtime:
- BioClaw is distributed and run via OpenClaw. In March 2026, Bloomberg reported that Chinese authorities issued notices to state-run enterprises, government agencies, and major banks ordering them not to install OpenClaw on office devices, citing the platform's broad data access and external communications capabilities (4). China's CNCERT/CC issued a separate official security advisory identifying prompt injection as a critical threat vector — where hidden malicious instructions can cause OpenClaw to leak system keys and sensitive data (4). Security researchers independently rated one OpenClaw vulnerability at 8.8 out of 10 on the CVSS severity scale (5).
- The specific risk BioClaw inherits from the OpenClaw runtime is this: any tool that runs inside OpenClaw's agent surface inherits its permission model, its network exposure, and its vulnerability profile. The NanoClaw container architecture provides execution isolation. It does not contain what OpenClaw exposes at the runtime boundary.
What makes this specifically dangerous in a biomedical context is the nature of the data itself. A genomic sequence cannot be changed after a breach. It identifies not just the individual who uploaded it, but their biological relatives who never consented to any analysis.
Cross-referenced with public genealogy databases, even a nominally de-identified genetic file can be re-linked to a specific person. EHR data carries diagnoses, treatment histories, and identifiers that a researcher might consider routine — but which constitute protected health information the moment they move through an unmanaged channel.
When EHR Gets Out: What the Operational Record Shows
The following cases establish the operational cost of EHR exposure. They are not analogies for BioClaw; they are the record of what happens when healthcare data moves through systems that were trusted before they were verified.
- Change Healthcare, February 2024. Ransomware hit UnitedHealth's claims-processing subsidiary. EHR access froze across pharmacies, hospitals, and clinics nationwide. Final confirmed count: 192.7 million individuals affected — the largest healthcare data breach in U.S. history (6). Estimated cost to UnitedHealth: $2.87 billion. OCR opened a HIPAA compliance investigation. Litigation is ongoing; legal experts expect any settlement to exceed the $115 million Anthem precedent given scale (7).
- Ascension Health, May 2024. One employee downloaded a malicious file. EHR systems went offline across 142 hospitals for nearly four weeks. Staff reverted to paper. Ambulances were diverted. Appointments were cancelled. 5.6 million patient records ultimately compromised — diagnoses, lab results, insurance details, government-issued identification numbers (8).
- Esse Health, April 2025. Cyberattack on a Missouri physician group. 23,671 patients' EHR data exposed. Settlement reached: $2,525,000. Claims deadline: August 2026 (9).
Three incidents. Different attack vectors. One constant: once EHR data moves through an unmanaged channel or unverified infrastructure, the institution loses control of what happens next.
BioClaw's WhatsApp channel is an unmanaged channel. EHR data is the modality the paper explicitly lists as supported.
That is the standard BioClaw has to be evaluated against.
If EHR exposure creates this level of institutional damage, then the next question is not whether BioClaw is useful. The next question is whether BioClaw exposes enough evidence surfaces for a covered entity to justify deploying it near PHI, EHR exports, or research genomic data.
STEM-BIO-AI was built to answer that kind of question.
So what did the audit find?
What STEM-BIO-AI Found
Finding | Evidence | Score Impact |
Missing clinical/non-diagnostic boundary | Clinical-adjacent surfaces without explicit non-clinical boundary | Stage 2R: -20 |
No domain-specific tests | Zero bio-domain test surface detected | Stage 3 T2: 0/15 |
No changelog | No CHANGELOG or release-history file | Stage 3 T3: 0/15 |
C5 compliance boundary WARN | Clinical-adjacent content without non-diagnostic disclaimer | WARN |
Final score: 60/100 — Tier 2 Caution. Research reference and supervised non-clinical technical review only.
The domain test score is zero — meaning there is no automated check to prevent the system from hallucinating a genomic sequence or silently returning a corrupted clinical summary back to the group chat.
No static test validates that biological outputs are structurally sound or that known PII formats are removed before processing. The changelog score is also zero. An institution that deployed BioClaw last month has no documented record of what changed since.
The Stage 2R penalty of -20 reflects a structural fact: the paper names EHR data as a use case, and no part of the repository or paper adds a "not for clinical deployment" boundary. That gap is not an accident. It is what makes the tool deployable in exactly the environments where it should not be deployed.
What the Score Actually Means
A Tier 2 Caution score does not mean BioClaw is empty, broken, or unserious. That would be the wrong reading.
The audit found real engineering surface. Workflow files were present. Dependency evidence existed. Several code-integrity checks passed. The repository is not a toy project, and the research problem it addresses is real.
But those signals do not answer the deployment question.
The weak points appear at the exact boundary where patient-adjacent use would require stronger evidence: no explicit non-clinical boundary, no domain-specific test surface, no changelog, no traceability manifest or runtime audit-log schema. From the raw report: Stage 1 scored 70, Stage 2R scored 50, Stage 3 scored 54. The Stage 4 Replication score is 35 — reported separately and not factored into the final score, but indicative of a reproducibility posture that makes independent verification difficult.
That is the difference between a capable research repository and a deployment-ready biomedical system. The score reflects the gap between those two things. The signals that make BioClaw easy to adopt are precisely the signals that should trigger an independent audit — not replace one.
Three Cases. Same Assumption.
The prior section described what false assurance looks like with BioClaw specifically. This section shows how that same assumption plays out across other institutions — not the cost of EHR exposure, but the mechanism by which organizations arrive at the breach without ever running an independent check.
Tempus AI — Genetic Data, Commercial Pipeline, Alleged No Consent (2026)
Tempus AI acquired Ambry Genetics for $600 million in February 2025, transferring Ambry's genetic testing database in the process. According to the consolidated class action Farrier et al v. Tempus AI, Inc. — consolidated April 15, 2026 in the U.S. District Court, Northern District of Illinois — Tempus is alleged to have trained AI models on that data and disclosed it to more than 70 third-party clients in deals totaling $1.1 billion, without the knowledge or consent of the data subjects (10)(11). These are allegations; no judgment has been entered.
The operative assumption: the acquisition process covered the data governance question. The consent scope of the original collection was never re-examined against the new use case.
Enzo Biochem — Infrastructure Trusted Because It Was Familiar (2023)
Ransomware hit Enzo Biochem's clinical test systems in April 2023. 2.47 million individuals affected. Investigators found shared credentials unchanged for years, no multi-factor authentication, and ineffective monitoring controls (12). Enzo paid $7.5 million to settle the class action and $4.5 million to three state governments.
The operative assumption: the infrastructure was safe because it had always been used that way. The researcher who routes genomic files through WhatsApp because that is how the group already communicates is making the same assumption today.
Wisconsin Physicians Service — Tool Trusted Because It Was Widely Used (2023)
A breach at Wisconsin Physicians Service in May 2023 went undetected for over a year. 3.1 million individuals affected — names, Social Security numbers, Medicare identifiers, treatment dates (13). The third-party file transfer tool involved was widely trusted across the industry. No independent security review had been run.
The operative assumption: widespread adoption equals verified safety. No one ran an independent check.
Three organizations. Three different failure modes. One shared assumption: someone else already verified this.
No one had.
Stars Are Not Audits
BioClaw's version of false assurance is more sophisticated than most. It wears the clothing of academic legitimacy.
374 stars. A bioRxiv paper. Authors from Michigan, SJTU, CAMS, PUMC, UIUC, Westlake, HKUST. STELLA foundation with Stanford and Princeton correspondence authors.
None of that is a security audit. None of it constitutes a HIPAA risk analysis. bioRxiv is a preprint server — the paper has not been peer reviewed for security or compliance claims. The 374 stars reflect interest, not inspection. Academic pedigree is the most effective form of false assurance in bio AI precisely because it signals rigor by proxy. It does not signal that anyone examined what happens when a researcher uploads an EHR export to a WhatsApp group to ask the bot a question.
The legal exposure does not fall on the repository maintainers. Open-source carries no HIPAA obligation. The liability belongs to the covered entity or business associate that deploys the tool against PHI or research genomic data and relies on star counts and institutional affiliations as the basis for that decision.
Under the HHS inflation-adjusted penalty schedule published in the Federal Register on January 28, 2026 — 2025 OMB inflation multiplier 1.02598 — exposure for covered-entity deployment involving PHI runs up to $73,011 per violation for willful neglect corrected within 30 days, and up to $2,190,294 per violation for willful neglect not corrected (14). The OpenClaw security concerns documented at the government level make "we did not know" a difficult posture to sustain.
The question is not whether this risk exists. It is whether your institution documented a risk analysis before deployment.
Three Checks Before You Deploy
1️⃣ The README is written for adoption. The paper is written for peers. Only one of them is designed to surface risk.
BioClaw's README describes BLAST searches and protein structures. It does not mention EHR data. The bioRxiv paper names EHR data in the scope description, on page one. Every organization that deployed BioClaw based on the README alone made a risk decision on an incomplete document. The clinical data modality they missed is the one that determines their HIPAA exposure for covered-entity deployments.
Read the paper. Because the line that changes your legal posture is not in the README.
2️⃣ The data reaches the channel before it reaches your model. Everything downstream is irrelevant to the breach.
Containerized execution, isolated Docker environments, a state-of-the-art bioinformatics toolkit — BioClaw has all of these. They are downstream of the data ingress point. When a researcher uploads a genomic sequence to a WhatsApp group, that file enters Meta's infrastructure before any of BioClaw's engineering controls take effect. It sits there for up to 30 days if undelivered (3). It enters the device backups of everyone in the group. None of BioClaw's compute architecture can undo what happened in the channel.
The model is not the exposure. The channel is. Audit the channel first.
3️⃣ BioClaw is not one tool. It is a stack. If you have not audited every layer, you have not audited the tool.
BioClaw is the biomedical application. NanoClaw is the container agent architecture. OpenClaw is the install and runtime path — and the layer with documented government-level security restrictions and a CVSS 8.8 vulnerability (4)(5). Three components. Three separate risk profiles. One product presented without distinguishing between them.
When a breach happens, the liability does not care which layer failed. It cares who deployed the stack without checking.
Final Thought
The researchers building BioClaw are solving a real problem. Bioinformatics workflows are fragmented. The friction of switching between command-line tools, visualization software, databases, and literature engines is real. Making that accessible through natural language in a shared group chat is a genuine architectural idea.
That is not the problem.
The problem is that a system capable of processing EHR data — data that is permanent, identifying, and biologically inherited — was deployed with a consumer messaging platform as its primary showcased intake channel, with no audit trail, no PHI scrubbing layer, and no clinical use boundary. None of those gaps appear in the abstract, the README, the star count, or the institutional affiliations.
Behind every query is a potential patient. The sequencing file a researcher uploads to ask a question is not abstract data. It is someone's genomic identity. Once it enters a consumer infrastructure, the institution has no mechanism to retrieve it, audit it, or contain the damage if it moves further.
The tool was built to accelerate research. Acceleration without containment is not a feature. In bio AI, it is the failure mode.
Stars are not audits. They never were.
Every claim in this newsletter is backed by machine-generated audit data. The raw output for this evaluation — scanner findings, explain trace, experiment results, and full report artifacts — is publicly available for download and independent verification.
We publish the data because "stars are not audits" applies to us too. Do not take our score on faith. Download the artifacts. Run your own review. If you find an error in our methodology, we want to know.

Next Week
Awesome-Bioinformatics — danielecook/Awesome-Bioinformatics. 73,000+ stars. The most-starred bioinformatics resource list on GitHub. AI agents are already citing it as a trusted source — recommending tools from the list for clinical genomic workflows without a compliance layer in sight. The list was built to surface good tools. It was not built to tell you which of those tools are safe for patient-adjacent deployment.
Key stat: 73,000+ stars. Zero clinical deployment disclaimers.
One audit question: When an AI agent recommends a tool from Awesome-Bioinformatics for genomic analysis, who is responsible for verifying that the recommendation is safe for clinical use? The list doesn't say. The agent doesn't ask. The institution assumes someone already checked.
References
(1) Runchuan-BU/BioClaw — GitHub. Accessed May 2026.
(2) BioClaw: Human-Bot Research Collaboration Ecosystems in Group Chats — bioRxiv. Posted April 14, 2026. doi: 10.64898/2026.04.11.716807.
(3) Guidance on HIPAA & Cloud Computing — HHS.gov. BAA requirement for ePHI transmitted via third-party cloud services; WhatsApp cannot execute a HIPAA BAA.
(4) China Moves to Limit Use of OpenClaw AI at Banks, Government Agencies — Bloomberg, March 11, 2026. CNCERT/CC official security advisory: prompt injection identified as critical threat vector enabling system key and sensitive data leakage. March 12, 2026.
(5) China Restricts OpenClaw AI at Banks Over Security Flaws — Winbuzzer, March 12, 2026. CVE-2026-25253, CVSS 8.8: authentication token theft / full gateway compromise. Kaspersky: OpenClaw named top insider threat for 2026.
(6) Change Healthcare Data Breach: 192.7 Million Affected — The HIPAA Guide, August 7, 2025.
(7) Change Healthcare Data Breach: What Happened and What to Do — Security.org, March 2026. UnitedHealth estimated costs $2.87 billion; OCR HIPAA investigation opened; settlement expected to exceed $115 million Anthem precedent.
(8) What We Can Learn from 2024's Top Healthcare Cyberattacks — Paubox. Ascension Health: EHR down nearly four weeks across 142 hospitals, 5.6 million patients.
(9) Esse Health Agrees to Pay $2.53M to Settle Data Breach Lawsuit — HIPAA Journal, May 2026. Claims deadline August 4, 2026.
(11) Healthcare AI Firm Sued Over Alleged Unlawful Disclosures of Genetic Data — HIPAA Journal, April 2026. Allegations only; no judgment entered.
(12) Tempus AI Sued For Breach of Genetic Information Privacy Act — GenomeWeb, February 2026. Farrier et al v. Tempus AI, Inc., U.S. District Court, Northern District of Illinois, consolidated April 15, 2026. Complaint stage.
(13) Enzo Biochem to pay $4.5 mln for failing to safeguard patient data — Reuters, August 2024.
(14) CMS Notifies Individuals Potentially Impacted by Data Breach — CMS.gov, 2024.
(10) HHS Inflation-Adjusted Civil Monetary Penalty Schedule — Federal Register, January 28, 2026. 2025 OMB multiplier: 1.02598.
B2B review path
If this touches a scientific, BioAI, or regulated workflow, route it like a team review.
These posts usually matter when a scientific or BioAI workflow has to survive technical review, evidence pressure, or institutional scrutiny. Start with a larger review path if the system already carries that weight.
Best fit: B2B team•Topic signal: Scientific & BioAI Infrastructure
Paid first step · Direct founder contact · Response within 1-2 business days
Share
Continue the series
View all in seriesPrevious in STEM_BIO_AI Audit Report
Your Bio Repo Could Get You Fined. Here Is Why We Check Every Single One.
Series continuation
This is currently the latest published entry.
Related Reading
Scientific & BioAI Infrastructure
How Do You Trust the AI Auditor? STEM-AI v1.1.2 and Memory-Contracted Bio-AI Audits
Scientific & BioAI Infrastructure
Beyond Repo Scanning: How AIRI Expanded the Risk Vocabulary in STEM BIO-AI 1.7.x
Scientific & BioAI Infrastructure