Structure Was the Real Bug — How I Ended Up Building dir2md

🪶 Silence, Then Confusion

I was building a SaaS product that had grown far beyond what I could easily hold in my head.

Nested folders. Scattered configs. Legacy files I was too afraid to delete.

Bugs kept surfacing. Tests broke. My energy was running out.

So I reached for an AI. This time, Claude CLI.

I zipped the repo and asked:

“Can you tell me what’s wrong?”

It thought for a while. Then replied:

“Bug fixed.”

Ten minutes of CPU cycles, a few lines of “patched” code. I ran the app — only to find the behavior changed in ways I never intended.

I rolled back and tried again. This time the bugged module disappeared entirely. Deleted. Gone.

Later, I found an open issue: Claude CLI was literally running ripgrep across the whole repo for even simple prompts, choking on large codebases.

The AI didn’t corrupt my logic intentionally; it just never constructed a reliable context to begin with.

🧩 The Human Sanity Check

Frustrated, I handed the repo to a junior developer. After a minute of scanning, he looked up:

“I don’t even know where to start.”

The next time, I turned to a senior engineer I trusted. Two days later, he came back with a single line:

“You need a structural refactor before you touch the logic.”

And then, more pointedly:

“It’s not that your code is wrong. It’s that no one — not me, not an AI — can navigate this.”

That was the turning point. Documentation wasn’t the problem. Structure was.

🧭 From Docs to Maps

I had been patching docs, READMEs, comments. But none of it mattered if the repo itself was a forest with no map.

So I asked myself:

“What if a directory tree could become Markdown — clean, filtered, token-aware?”

I searched GitHub. I tried three different tools. Each failed in different ways:

One dumped every file indiscriminately — unreadable for AI.

Another ignored .gitignore and leaked credentials into the doc.

A third produced context so bloated that even GPT-4 refused to parse it.

Reddit threads mirrored the same frustration:

“Without good tooling around them, LLMs are utterly abysmal for pure code generation.”

and another:

“Most of this prompt engineering is just sugar-coating — it doesn’t solve the lack of structure.”

That’s when I stopped searching. If no one had built it, I would.

And that experiment became dir2md.

⚙️ What dir2md Does

Run this:

You’ll get a Markdown blueprint:

It looks simple — but every choice in that output came from pain.

Each filter, each omission, each design decision was the product of failed tools, Reddit threads, and nights spent undoing broken fixes.

I didn’t sketch these rules on paper.

I discovered them in the fire, one mistake at a time.

And that’s how the lessons emerged.

🧹 Lesson 1: Most of Your Repo Is Noise

The first truth I hit: 90% of a repo is clutter.

LLMs choke on it. Humans do too.

So dir2md:

Respects .gitignore

Drops build artifacts, caches, temp files

Deduplicates with SimHash

Because noise doesn’t just waste tokens. It wastes sanity.

🧠 Lesson 2: Tokens Are Oxygen

Hit a context limit once, and you learn: tokens are oxygen.

Drown the model in irrelevant files, and it suffocates.

dir2md budgets tokens:

Head/tail sampling for large files

Token-aware chunking

Multiple output modes (summary, inline, reference)

JSON manifest for workflows

The principle: don’t feed the model everything.

Give it enough to breathe — and think.

🔐 Lesson 3: Docs Shouldn’t Leak Secrets

I’ve seen repos where .env files slipped into “docs.” That’s not documentation — it’s a breach.

dir2md masks common secrets by default:

API tokens, AWS keys, private key blocks.

Docs must clarify, not compromise.

🗺️ Lesson 4: Structure Is the Map

Noise, tokens, secrets — they were symptoms.

The real problem was directionless repos.

dir2md’s output isn’t a flat list. It’s a map: hierarchies, relationships, entry points. The difference between wandering and knowing where to begin.

Humans need maps. So do AIs.

🌱 Lesson 5: Context Can Be Evergreen

Code changes daily. Docs rot weekly.

But context doesn’t have to.

dir2md can be integrated into CI, so every merge regenerates a fresh blueprint. --no-timestamp ensures builds are deterministic, reproducible, always in sync.

Evergreen context means no more “outdated docs” problem — for humans or machines.

🧭 What I Learned

Maps over blurbs. A repo without a map is just entropy.

AI needs structure. Input quality determines output quality.

Docs are collaboration. With teammates, with your future self, with AI.

Structure is the invisible interface that makes everything else possible.

🔗 Try It Yourself

Want to give it a spin?

👉 All links are in the first comment.

✨ Final Thought

I didn’t build dir2md to impress anyone.

I built it because I couldn’t parse my own repo anymore — and neither could the AI.

If you’ve ever stared at your project and felt lost, maybe this little CLI will help.

Or at least remind you: structure always comes first.