Flamehaven LogoFlamehaven.space
back to writing
Structure Was the Real Bug — How I Ended Up Building dir2md

Structure Was the Real Bug — How I Ended Up Building dir2md

A firsthand account of how debugging chaos, failed AI assistance, and the absence of structure led to the creation of dir2md — an open-source CLI that filters, secures, and restructures codebases into token-efficient Markdown maps for developers and AI workflows.

Structure Was the Real Bug — How I Ended Up Building dir2md

🪶 Silence, Then Confusion

I was building a SaaS product that had grown far beyond what I could easily hold in my head.
Nested folders. Scattered configs. Legacy files I was too afraid to delete.
Bugs kept surfacing. Tests broke. My energy was running out.
So I reached for an AI. This time, Claude CLI.
I zipped the repo and asked:
“Can you tell me what’s wrong?”
It thought for a while. Then replied:
“Bug fixed.”
Ten minutes of CPU cycles, a few lines of “patched” code. I ran the app — only to find the behavior changed in ways I never intended.
I rolled back and tried again. This time the bugged module disappeared entirely. Deleted. Gone.
Later, I found an open issue: Claude CLI was literally running ripgrep across the whole repo for even simple prompts, choking on large codebases.
The AI didn’t corrupt my logic intentionally; it just never constructed a reliable context to begin with.

🧩 The Human Sanity Check

Frustrated, I handed the repo to a junior developer. After a minute of scanning, he looked up:
“I don’t even know where to start.”
The next time, I turned to a senior engineer I trusted. Two days later, he came back with a single line:
“You need a structural refactor before you touch the logic.”
And then, more pointedly:
“It’s not that your code is wrong. It’s that no one — not me, not an AI — can navigate this.”
That was the turning point. Documentation wasn’t the problem. Structure was.

🧭 From Docs to Maps

I had been patching docs, READMEs, comments. But none of it mattered if the repo itself was a forest with no map.
So I asked myself:
“What if a directory tree could become Markdown — clean, filtered, token-aware?”
I searched GitHub. I tried three different tools. Each failed in different ways:
  • One dumped every file indiscriminately — unreadable for AI.
  • Another ignored .gitignore and leaked credentials into the doc.
  • A third produced context so bloated that even GPT-4 refused to parse it.
Reddit threads mirrored the same frustration:
“Without good tooling around them, LLMs are utterly abysmal for pure code generation.”
and another:
“Most of this prompt engineering is just sugar-coating — it doesn’t solve the lack of structure.”
That’s when I stopped searching. If no one had built it, I would.
And that experiment became dir2md.

⚙️ What dir2md Does

Run this:
You’ll get a Markdown blueprint:
It looks simple — but every choice in that output came from pain.
Each filter, each omission, each design decision was the product of failed tools, Reddit threads, and nights spent undoing broken fixes.
I didn’t sketch these rules on paper.
I discovered them in the fire, one mistake at a time.
And that’s how the lessons emerged.

🧹 Lesson 1: Most of Your Repo Is Noise

The first truth I hit: 90% of a repo is clutter.
LLMs choke on it. Humans do too.
So dir2md:
  • Respects .gitignore
  • Drops build artifacts, caches, temp files
  • Deduplicates with SimHash
Because noise doesn’t just waste tokens. It wastes sanity.

🧠 Lesson 2: Tokens Are Oxygen

Hit a context limit once, and you learn: tokens are oxygen.
Drown the model in irrelevant files, and it suffocates.
dir2md budgets tokens:
  • Head/tail sampling for large files
  • Token-aware chunking
  • Multiple output modes (summary, inline, reference)
  • JSON manifest for workflows
The principle: don’t feed the model everything.
Give it enough to breathe — and think.

🔐 Lesson 3: Docs Shouldn’t Leak Secrets

I’ve seen repos where .env files slipped into “docs.” That’s not documentation — it’s a breach.
dir2md masks common secrets by default:
API tokens, AWS keys, private key blocks.
Docs must clarify, not compromise.

🗺️ Lesson 4: Structure Is the Map

Noise, tokens, secrets — they were symptoms.
The real problem was directionless repos.
dir2md’s output isn’t a flat list. It’s a map: hierarchies, relationships, entry points. The difference between wandering and knowing where to begin.
Humans need maps. So do AIs.

🌱 Lesson 5: Context Can Be Evergreen

Code changes daily. Docs rot weekly.
But context doesn’t have to.
dir2md can be integrated into CI, so every merge regenerates a fresh blueprint. --no-timestamp ensures builds are deterministic, reproducible, always in sync.
Evergreen context means no more “outdated docs” problem — for humans or machines.

🧭 What I Learned

  • Maps over blurbs. A repo without a map is just entropy.
  • AI needs structure. Input quality determines output quality.
  • Docs are collaboration. With teammates, with your future self, with AI.
Structure is the invisible interface that makes everything else possible.

🔗 Try It Yourself

Want to give it a spin?
👉 All links are in the first comment.

✨ Final Thought

I didn’t build dir2md to impress anyone.
I built it because I couldn’t parse my own repo anymore — and neither could the AI.
If you’ve ever stared at your project and felt lost, maybe this little CLI will help.
Or at least remind you: structure always comes first.

Share

Related Reading