Iterative plan-building and implementation with AI
Continues from AI Effectively — A Systems Approach
In the previous article, AI Effectively — A Systems Approach, we established that working with AI functions as a control loop — with a clearly defined setpoint, feedback, and iterative refinement. We understood why feedback is essential, how the context window functions as internal state, and why the developer is a coordinator rather than an executor.
We now dive into one of the most demanding — and simultaneously most important — aspects of the AI-First approach: building a high-quality plan for a larger refactor, migration, or greenfield application. And then into the implementation that follows from that plan.
A fundamental insight: application requirements alone are not enough to build a good plan. AI must not only understand what needs to be done, but above all how — taking into account existing components, abstractions, libraries, and their underlying principles. Only then can it propose an architecture that is sufficiently general and sustainably maintainable over time.
Without this deep understanding, AI typically produces a solution tailored to the spec — functional, but built without component-oriented thinking, without general abstractions, with no room for extension. That is precisely why building a plan is an iterative process that cannot be skipped.
The entire process of building a plan with AI can be compared, with surprising precision, to the work of a sculptor. A sculptor who receives a block of marble does not immediately chisel out a finished statue complete with every fine detail. They proceed in layers — from rough form to progressively finer features. And AI must follow exactly the same path when understanding a project and designing its architecture.
From rough stone through basic features to a finished sculpture — this is precisely how AI must proceed when building a plan
AI explores the project, orients itself within the structure, and assembles a first plan of changes based on the requirements. This plan is merely a rough outline — it captures the intent but has no deeper understanding of the architecture.
AI examines the components in use, asks for a general architecture, and compares it against the first plan. A new plan emerges with basic architectural features — where the "face" will be, where the "body" of the sculpture will go.
AI looks inside the components, third-party libraries, and dependencies. Only now does it verify whether the general architecture is correct and whether the sculpture can actually be completed in this form. The final plan takes shape.
Why all this complexity? Why not simply hand AI all the requirements, the architecture, the inputs and outputs — and leap directly from raw stone to finished sculpture?
The answer lies in the fundamental nature of how LLMs work. The model incrementally learns the given system — using what it discovers to progressively refine its intent and output. This learning process is gradual and must always be properly situated within context. Hence the incremental, updated plans; hence the feeding of additional key components that AI had no knowledge of at the outset.
It is error minimization — minimizing e — exactly as we described in the control-loop model. Each iteration reduces the gap between the current state of the plan and the optimal solution. Jumping from point A directly to point C bypasses this learning process entirely, and the result will always be significantly worse.
We touched on plan-building in the previous article as part of the control loop. Here we focus specifically on the plan-creation process itself — not on the implementation (that comes later in this article and shares many characteristics with planning, particularly around reviews and continuous testing).
The key insight is that at each step AI performs a differential — it compares what it has already produced against new information about the project, and uses that differential to refine its design. This is a process of learning and improving the proposal based on accumulated knowledge of the project and fresh information that sharpens the result.
AI scans the project, orients itself within the file structure, configuration, and high-level architecture. It establishes what it is working with.
Based on the requirements and the initial exploration, AI assembles a plan of changes. This plan is the rough stone — it captures the intent but has no deeper understanding of the components and abstractions in play.
Plan v1 is moved to the archive. It will serve as the first level of abstraction and a reference point for measuring how far we have advanced toward a higher-quality solution.
Caution — a common mistake!
Many developers kick off implementation right here. Plan v1 looks reasonable, the requirements are covered — so why continue? Unfortunately, this plan has nothing to do with general architecture and long-term maintainability. It is merely the rough stone from which the sculpture has yet to be carved.
AI examines the components in use, existing abstractions, and patterns in the project. We deliberately do not yet look inside the components — we are working at the level of interfaces and architecture.
The critical step: AI must propose a solution that is component-oriented, makes use of existing abstractions, and is prepared for future extension. We compare against Plan v1 — what is the difference?
AI computes the differential between Plan v1 and the new proposal. This differential is the deviation measure e₁ — it reveals how far the rough stone was from a genuinely high-quality solution. The new plan (v2) is beginning to show the sculpture's basic features.
AI looks inside the components, into the internals of third-party libraries and dependencies the project uses. The goal is to fully understand the principles on which the system is built.
Only now does it become clear whether the proposed general architecture is truly feasible — whether the sculpture can actually be finished in this form. Sometimes this reveals that the approach needs to change.
AI once again takes stock, registers the newly discovered components, and computes a fresh delta against the final form. If the model's internal state and the new findings do not diverge significantly, we are on the right track — the final plan for this sculpture's shape is ready.
If, however, it turns out that AI's internal state and the remaining requirements differ substantially, it is worth starting a new sculpture and changing the architecture. Conversely — if the components fit the approach and the differentials are small, that is confirmation that we are heading in the right direction.
For a larger project, it is worth trying several approaches. If the developer has additional ideas, or AI arrives at alternative architectures, the entire process can be repeated and multiple plans produced.
Sculpture variants — AI makes it possible to produce several complete plans in a short time and compare their trade-offs
This is where an enormous advantage of AI becomes apparent: within a very short timeframe — a few hours or a single day — it is possible to produce multiple complete, sophisticated plans, compare their trade-offs, and choose the optimal variant.
A human would need days or weeks for a single such plan. AI can produce several in a fraction of the time — and that is a transformative shift in the way architectural decisions are made.
At every completed section of the plan, independent reviews must be requested. Ideally multiple diverse models simultaneously — each model has different strengths and "sees" potential problems differently.
Fundamental architectural errors, incorrect use of components, security risks. Must be resolved before proceeding.
Significant design problems — missing abstractions, inefficient solutions, inconsistencies. Must be resolved before finalizing.
Minor improvements, stylistic notes, optimizations. May be addressed later or disregarded.
Review procedure:
1. After each phase of the plan, invoke independent reviews from multiple models simultaneously.
2. Repeat the fix cycle until all critical and high findings have been eliminated.
3. After the complete plan, perform a final review — on the last step and on the plan as a whole.
Only then is the plan ready for implementation. It is worth having it explained back, examining the details of the resulting architecture, and thinking through the E2E tests and the complete testing strategy. All of this is best prepared before implementation begins — correctness is far easier to verify when tests are written against the original use cases rather than retrofitted to match the implementation.
With a completed and verified plan, the implementation phase begins. Its progression shares many characteristics with plan-building — particularly in terms of reviews, tests, and continuous verification. The cycle closely mirrors the control loop described in the first article.
AI first assembles the workflow: the order in which individual parts of the plan will be implemented, when integration tests will begin alongside unit tests, and when E2E and smoke tests come into play.
AI breaks the plan into implementable units and proposes an order. Infrastructure foundations and shared components go first; specific business logic and UI follow.
Incremental implementation, file by file, following the established order. Each file is accompanied by unit tests written concurrently with the code.
After every implemented file, a round of independent reviews must follow from all available models. The same three-tier system as for the plan:
Cycle after each file:
1. Independent reviews from multiple models → identify critical / high / low findings.
2. Iteratively eliminate all critical and high findings.
3. Write regression tests — so the identified issue cannot recur.
4. Write any tests still missing for the given component.
5. Proceed to the next file.
Testing proceeds in layers — each layer is added at the appropriate phase of implementation:
Written concurrently with implementation of each file. They verify the isolated behavior of individual functions and components.
Added after coherent sections of the implementation. They verify component collaboration and correct layer integration. Where possible, this is also the first user-facing test of the foundation.
After the entire implementation is complete. The system is run on real or local environments and all happy paths, edge cases, and everything critical for the application are exercised.
A quick check that the application works as a whole. Essential for continuous verification that the implementation has not broken existing functionality.
Rule: test more rather than less. Every untested scenario is a potential regression that AI will not catch.
Once implementation is complete, AI should take on the role of independent tester — testing use cases without knowledge of the implementation. This surfaces scenarios that the author (even an AI author) overlooked because they knew the code "from the inside."
This approach is critical: AI that wrote the code tends to write tests that confirm its own implementation. An independent tester (a different instance, a different model) tests against the requirements and use cases, not against the code.
Calculate code coverage and systematically fill in the missing areas. The goal is to reach above 80% coverage for all critical components.
If a legacy application exists, try real inputs and compare the outputs. Everything must match — if it does not, find out why. This special E2E test should be preserved as a permanent regression safeguard.
The handoff includes defined performance metrics, configuration parameters, and a CI/CD pipeline configured for automated testing.
The deliverable — beyond the plan and implementation — includes the creation of documentation:
A description of the overall architecture, layers, components, and their relationships. An understanding of the principles on which the solution is built.
Documentation of individual use cases, business workflows, and their mapping onto the technical implementation.
An overview of all configuration parameters, their default values, and their effect on application behavior.
The implementation is handed off to the developer together with the defined metrics, configuration parameters, and CI/CD setup.
Deploy to dev → developer reviews → any further independent verification of the implementation.
The stabilization period tends to be longer than with a conventional rewrite. This changes significantly, however, if the preceding process was carried out with care — in that case the period will be noticeably shorter. As models continue to improve, this duration naturally decreases further.
At a recent hackathon we applied this entire process in practice and created a detailed walkthrough showing how a plan is assembled step by step in the real world. The result is an accompanying HTML document that traces a concrete run through the full process — from requirements, through iterative refinement, to a final plan ready for implementation.
We recommend working through this appendix — you will see what the individual phases look like in practice, how the plans progressively sharpen, and what a real differential between iterations looks like. Theory only makes sense once you see an actual run-through.
Building a high-quality plan with AI is not a matter of a single prompt — it is a controlled, iterative process in which AI progressively deepens its understanding of the project and, with each new layer of information, refines its design.
Just as a sculptor gradually reveals the form from rough stone, AI too must move through the phases — from initial exploration, through understanding the components, to a detailed knowledge of library internals. Each phase yields a differential that reduces the deviation from the optimal solution.
Key principles:
You cannot jump from stone to sculpture. The model learns step by step, and its understanding must be properly situated within context at each stage.
Each iteration measures the deviation e — the gap between the current and the optimal state. The differential converges toward zero.
After every phase and after every file. Multiple models simultaneously. Critical and high findings must be resolved before moving on.
Unit → integration → E2E → smoke. AI as an independent tester. Coverage above 80%. Tests written against the requirements, not against the implementation.
The result is a plan and implementation that is component-oriented, general, and sustainably maintainable — not merely a functional solution tailored to a single brief. That is the essence of the AI-First approach.