governancePROTOTYPEPUBLICMIT License
HRPO-X
Hybrid Reasoning Policy Optimization (HRPO): a research prototype for hybrid latent reasoning with RL.
About This Work
This is not production software and does not claim full paper compliance.
ai-reasoningcognitive-architecturedeductive-reasoningdeep-learninghybrid-reasoninginductive-reasoningmachine-learningml-frameworkneural-networkspolicy-optimizationpythonpytorchreinforcement-learningresearch
Repository Overview
This is not production software and does not claim full paper compliance.
README Core
This is not production software and does not claim full paper compliance.
Use & Documentation
Detailed installation, commands, examples, and deeper usage notes live in the repository README and docs.
README Map
- Scope
- Quick Start
- Structure
- Limitations
- License
Key Signals
- prototype utilities in hrpox/core v2 2.py
- clean-room paper primitives (Eq3/Eq4/Eq6) in hrpox/paper core.py
- demo-scale pipelines in hrpox/paper pipeline.py and hrpox/paper trainer.py
- importance sampling loss with adaptive epsilon
- adaptive r min controller
Announcements
synced Mar 13, 2026
No mirrored announcements yet.