Flamehaven LogoFlamehaven.space
back to selected work
verificationPRODUCTION READYPUBLICMIT License

ProofCore-AI-Benchmark

ProofCore is a browser-native, 100% offline-first, hybrid mathematical proof verification engine. It combines rigorous symbolic math with semantic understanding to reliably verify mathematical proofs, offering zero ex...

About This Work

ProofCore is a browser-native, 100% offline-first, hybrid mathematical proof verification engine. It combines rigorous symbolic math with semantic understanding to reliably verify mathematical proofs, offering zero external dependencies and production-ready quality

benchmarkbrowser-natived3jsdevopsfastapimathematical-proofoffline-firstprivacyproof-verificationpyodidereactresearchsemantic-understandingsoftware-qualitysymbolic-mathtypescriptzustand

Repository Overview

ProofCore is a browser-native, 100% offline-first, hybrid mathematical proof verification engine. It combines rigorous symbolic math with semantic understanding to reliably verify mathematical proofs, offering zero external dependencies and production-ready quality

README Core

The first proof verification system that works 100% offline with zero external dependencies

As shown in Frieder & Hart (2025): "No LLM Solved Yu Tsumura's 554th Problem"

Despite high benchmark scores, all major LLMs fail on rigorous mathematical reasoning:

Use & Documentation

Detailed installation, commands, examples, and deeper usage notes live in the repository README and docs.

README Map

  • Browser-native · 100% Offline-First · Production Ready
  • The Problem
  • The Solution
  • Key Advantages
  • Quick Start
  • Installation

Key Signals

  • ❌ GPT-4o: Correct syntax, wrong logic
  • ❌ Claude 3.5: High confidence, low accuracy
  • ❌ Gemini 2.0: Plausible but incorrect reasoning
  • ❌ LLaMA 3.1: Hallucinated "proofs"
  • ProofCore ships with VITE OFFLINE MODE=true, so verification works entirely in-browser without starting the FastAPI backend.

Announcements

synced Mar 13, 2026

No mirrored announcements yet.