CRoM-EfficientLLM
A Python toolkit to optimize LLM context by intelligently selecting, re-ranking, and managing text chunks to fit a model's budget while maximizing relevance.
About This Work
A Python toolkit to optimize LLM context by intelligently selecting, re-ranking, and managing text chunks to fit a model's budget while maximizing relevance. Combines sparse (TF-IDF) and dense (Sentence-Transformers) retrieval scores for robust and high-quality reranking of documents.
Repository Overview
A Python toolkit to optimize LLM context by intelligently selecting, re-ranking, and managing text chunks to fit a model's budget while maximizing relevance. Combines sparse (TF-IDF) and dense (Sentence-Transformers) retrieval scores for robust and high-quality reranking of documents.
README Core
CRoM (Context Rot Mitigation)-EfficientLLM is a Python toolkit designed to optimize the context provided to Large Language Models (LLMs). It provides a suite of tools to intelligently select, re-rank, and manage text chunks to fit within a model\'s context budget while maximizing relevance and minimizing performance drift.
This project is ideal for developers building RAG (Retrieval-Augmented Generation) pipelines who need to make the most of limited context windows.
Install the package directly from source using pip. For development, it\'s recommended to install in editable mode with the extras.
Use & Documentation
Detailed installation, commands, examples, and deeper usage notes live in the repository README and docs.
README Map
- Key Features
- Installation
- Quickstart
- 🚀 Interactive Demo
- Local Demo
- CLI Benchmarking Examples
Key Signals
- Budget Packer: Greedily packs the highest-scoring text chunks into a defined token budget using a stable sorting algorithm.
- Hybrid Reranker: Combines sparse (TF-IDF) and dense (Sentence-Transformers) retrieval scores for robust and high-quality reranking of documents.
- Drift Estimator: Monitors the semantic drift between sequential model responses using L2 or cosine distance with EWMA smoothing.
- Observability: Exposes Prometheus metrics for monitoring token savings and drift alerts in production.
- Comprehensive Benchmarking: Includes a CLI for end-to-end pipeline evaluation, budget sweeps, and quality-vs-optimal analysis.
Announcements
synced Mar 13, 2026