Flamehaven LogoFlamehaven.space
back to selected work
governancePROTOTYPEPUBLICMIT License

HRPO-X

Hybrid Reasoning Policy Optimization (HRPO): a research prototype for hybrid latent reasoning with RL.

About This Work

This is not production software and does not claim full paper compliance.

ai-reasoningcognitive-architecturedeductive-reasoningdeep-learninghybrid-reasoninginductive-reasoningmachine-learningml-frameworkneural-networkspolicy-optimizationpythonpytorchreinforcement-learningresearch

Repository Overview

This is not production software and does not claim full paper compliance.

README Core

This is not production software and does not claim full paper compliance.

Use & Documentation

Detailed installation, commands, examples, and deeper usage notes live in the repository README and docs.

README Map

  • Scope
  • Quick Start
  • Structure
  • Limitations
  • License

Key Signals

  • prototype utilities in hrpox/core v2 2.py
  • clean-room paper primitives (Eq3/Eq4/Eq6) in hrpox/paper core.py
  • demo-scale pipelines in hrpox/paper pipeline.py and hrpox/paper trainer.py
  • importance sampling loss with adaptive epsilon
  • adaptive r min controller

Announcements

synced Mar 13, 2026

No mirrored announcements yet.