So I heard about spec-kit a few weeks back.
I checked it out, and read about the idea of “Spec-Driven Development“.
Now, I have to say, I don’t think Spec-Driven Development is reliable, just yet, for any type of complex application.
Spec-Driven Development? Hold your horses.
Perhaps if all you want is a front-end that does CRUD and some data presentation? That methodology might fly.
See, the problem is that writing specs and letting AI develop the code will result in different code being deployed to production.
“Oh, you have Test Driven Development to ensure the code works well!” they might say.
Well, I find AI is not very good at writing meaningful tests. This is exacerbated by the fact that it’ll have to write new tests everytime a feature is somewhat touched or changed.
And, as anybody who has worked with unit tests can tell you, passing tests does not mean you have a working application.
Tests need to be thoughtfully applied where they are the most meaningful. AI seems to behave like a developer that needs to attain a certain amount of coverage and will test trivial things to achieve it (a sin many of us are not innocent of. Perhaps that’s where it has learned it from).
I experienced this myself. I recently used spec-kit to create a new feature. It was just a few pages, a few modals, a few notifications and a couple of tables.
Spec-kit (with Claude), created about 120 tests. They all passed. But when I checked a page, the components were not rendering.

If you care about the quality of your code, its maintenance and security, you will always care about what the code is and does.
I can only see Spec-driven development caring about the apparent functionality, from the Specification’s point of view, and even then, the resulting code and generated UI might be very different indeed.
Of course, you can specify everything in great, deep detail. Include rules for UI elements, code quality, security, and performance in your specification. Perhaps linter checks and figma validation.
But, run that process again, and variations will still materialize. Different LLMs will produce different code. Even the same LLM can vary, or be tweaked and in a week produce something different. Something that needs to be re-tested and reviewed.
“There are many ways to skin a cat”, says that unpleasant old adage. You might not like the way your LLM decides to do any particularly important thing, the second time it gets built and deployed.

Ok, so let’s stop bashing the thing. I don’t think it’s ready to be used that way, but I do think it’s a valuable tool because of its main use: organizing and directing an LLM on feature creation.
Using Spec-kit to focus and direct your LLM

Whether you want to ‘vibe-code’ or do ‘LLM-assisted development’, Spec-kit is great at focusing your LLM into doing precisely what you want it to do.
If you’ve ever asked an LLM to build a complete feature, you find the common problems of:
- Dealing with the LLMs memory and context (it forgets what it’s done before).
- Dealing with big rewrites, or it suddenly decides to go off on a tangent and over-engineer one aspect of your project.
- Maybe you forget something and it forgets something else and now you have a mess where you don’t remember what you asked for and the LLM makes stuff up as it goes along.
Spec-kit is good at herding the cats that are LLMs. It’s also great at helping you to stay organized:
- When you create the specification, you are asked to say, clearly, what you want to do, and why you want to do it.
- When you create the plan, you are asked to say, clearly, how you want to do it (what language, what database, how they should connect, what’s the expected architecture, etc).
- If spec-kit feels something needs clarification, it asks for it.
- Then you tell it to create a list of tasks, and then go forth and build the thing.
If you’re careful, there is very little tweaking required after it’s done. Depending on what it got wrong, you may find that adding some more detail to the plan or specification stages would take care of that from the get-go.
That is where I’m finding the most value. Using spec-kit I can get a feature 80% to completion in an hour or two. The other 20% (sounds familiar…) is spent in tweaking, testing, and ensuring the code is at the level of quality I am expected to commit into my repository.
The developer’s responsibility
Now, it’s important to state that I still want to review that generated code.
I still want to ensure that I understand what it does, and why, before I submit it.
The tool is writing code for me, but I am still the developer.
It is my tool, and so I remain responsible.

I guess you could say that is the difference between vibe-coding and LLM-assisted development.
When you vibe-code, you go with the flow, wherever the LLM leads you (technically speaking).
When you are a developer, being assisted by an LLM, you lead the tool.
Either way, I 100% recommend spec-kit as a tool for both camps. It’s always getting better, and I am using it very often for new features professionally, as well as for personal projects.