governancePROTOTYPEPUBLICMIT License

HRPO-X

Hybrid Reasoning Policy Optimization (HRPO): a research prototype for hybrid latent reasoning with RL.

About This Work

This is not production software and does not claim full paper compliance.

ai-reasoningcognitive-architecturedeductive-reasoningdeep-learninghybrid-reasoninginductive-reasoningmachine-learningml-frameworkneural-networkspolicy-optimizationpythonpytorchreinforcement-learningresearch

Repository Overview

This is not production software and does not claim full paper compliance.

README Core

This is not production software and does not claim full paper compliance.

Use & Documentation

Detailed installation, commands, examples, and deeper usage notes live in the repository README and docs.

Open README Open Docs

README Map

Scope
Quick Start
Structure
Limitations
License

Key Signals

prototype utilities in hrpox/core v2 2.py
clean-room paper primitives (Eq3/Eq4/Eq6) in hrpox/paper core.py
demo-scale pipelines in hrpox/paper pipeline.py and hrpox/paper trainer.py
importance sampling loss with adaptive epsilon
adaptive r min controller

Announcements

synced Mar 13, 2026

No mirrored announcements yet.