The AI-Native Engineering Workflow

Stage 01

Brief.

problem before solution

Every unit of work starts with a written problem definition — not a feature description. Customer, pain, scope, success criteria. The model is a Socratic partner here, not an author.

Engineer / PM owns

Naming the customer and their actual pain in one sentence.
Drawing the scope line — what's in, what's explicitly out.
Writing success criteria a test could actually check.

Model assists with

Drafts a problem-statement scaffold from a ticket or chat.
Asks the ambiguity questions out loud until they have to be answered.
Pulls related prior briefs, postmortems, and ADRs into context.

Inputs

Ticket · chat thread · customer signal
Indexed corpus of prior briefs & decisions

Outputs

A brief in canonical structure: problem · outcome · scope · criteria
Signed by a named owner before the next stage starts

Antipattern Skipping the brief because "everyone knows what we're building." If the next agent doesn't, you'll find out at PR review — when it's expensive.

Stage 02

Research.

read the code with intent

Find out what is already true — about the codebase, the data, the customer — before committing to what should be. AI compresses days of "where is this used" archaeology into an afternoon you can actually act on.

Engineer owns

Choosing which evidence is strong enough to bet on.
Tracing the few code paths that actually matter.
Talking to the people the model can't.

Model assists with

Maps call graphs, owners, and hotspots in the relevant slice of code.
Surfaces prior tickets, ADRs, retros that touched the same area.
Drafts a "what's known / what's not yet" inventory linked to source.

Inputs

Repo · telemetry · incident history
Interview notes & customer recordings

Outputs

A research note the plan can lean on without re-deriving
A short risk list — things that must be designed around

Antipattern Treating a model summary as primary evidence. Synthesis is downstream of facts; link the note back to the code and conversations it came from.

Stage 03

Plan.

highest-leverage artefact

The implementation plan is the contract between a human's intent and every downstream agent. Files that will change, failure modes, test surfaces, an explicit checklist. This is the single highest-leverage artefact in the workflow.

Tech lead owns

The architectural calls: what to add, what to leave, what to refactor under.
The rejected alternatives — kept in the doc, not deleted.
The "small enough to land in one PR" decision.

Model assists with

Drafts the plan scaffold from the brief plus the research note.
Generates the file-change table, architecture diagram, test outline.
Reviews for ambiguity — "what would I need to ask to act on this?"

Inputs

Brief · research note · existing system contracts
Repo conventions & recent plans on adjacent code

Outputs

A plan a teammate model could execute without follow-up
One ADR per significant choice, with rejected options

Antipattern A flat backlog. Without a checklist a model can pull from, the agent will improvise and the diff will surprise the reviewer.

Stage 04

Plan review.

cheapest place to argue

A bad plan caught at review costs an hour. The same bad plan caught after implementation costs a day. This is the cheapest place to argue — and the first stage that runs through an adversarial gate.

Reviewer owns

Sign-off on scope, risk, and the "is this the right small slice" question.
Pushing back when the plan optimises for keystrokes over readability.
Deciding what to skip, defer, or split.

Adversarial reviewer does

A second, structurally different model attacks the plan: missing edges, weak coverage, broken assumptions.
Loops with the planner until verdicts converge — or escalates.
Files the disagreement record as part of the plan's history.

Inputs

The plan · the same context the planner used
Repo conventions & prior review patterns

Outputs

An approved plan with residue called out (open questions, known risks)
A gate-log entry — verdict, retries, findings

Antipattern Same model reviewing its own plan. A friendly reviewer rubber-stamps; an adversarial one — different family, different rubric — finds the cracks.

Stage 05

Implementation.

work the checklist

Once the plan is approved, the agent works the checklist. The engineer's job is staging context, steering, and judging — not typing. Status updates run in the foreground; approval-asks should not.

Engineer owns

Loading the agent with the right context: plan, neighbouring code, conventions.
Reading every diff before it leaves the workstation.
Calling the moment to stop iterating and rewrite by hand instead.

Agent does

Implements each checklist item against the acceptance criteria.
Runs lint, types, and unit tests locally before surfacing a diff.
Posts brief status lines per step; does not stop to ask "should I continue?"

Inputs

Approved plan with checklist · curated context window
Isolated workspace per unit of work

Outputs

Small, self-described commits that pass CI on arrival
An updated context pack for the next stage

Antipattern The "mega PR." Long agent sessions accumulate drift; force small, verifiable slices and rebase often. Also: pausing mid-phase to ask permission — the plan was the approval.

Stage 06

Verification.

evidence, not claims

Tests pass on a developer's machine. A change is verified when it has been deployed to a real environment and exercised end-to-end. Verification produces an evidence pack — not "looks good to me."

Engineer owns

The eval set: cases that matter most to the customer and the business.
Reading the failing test first, before reading the fix.
The "is this evidence, or is this a story" judgement on the artefact.

Agent does

Synthesises unit, integration, and property tests from the plan.
Drives the deploy to a dev environment and re-runs the suite there.
Captures screenshots, logs, and responses as the evidence pack.

Inputs

Implemented branch · curated eval set
Reachable deploy target with realistic data

Outputs

An evidence pack attached to the change
Regressions filed back to the eval set, not silently fixed

Antipattern Green CI quoted as proof. CI is necessary; deploy-and-exercise is sufficient. Trust evidence over reassurance.

Stage 07

Code review.

two passes · one human

A model does the wide pass — style, missed tests, naming, neighbouring code. A human does the narrow pass — intent, taste, the things the model can't see. Both are required; neither is sufficient alone.

Reviewer owns

Does this match the plan's intent — not just its letter?
Will this be operable at 3am by an on-call who didn't write it?
Does this make the codebase a place people want to keep working in?

Model does

Mechanical checks: lint, types, dead branches, missing tests.
Plan-vs-diff alignment: did the agent stay on the checklist?
Drafts the PR description; the human confirms or edits.

Inputs

The PR · the linked plan · the evidence pack
Repo conventions & recent review history

Outputs

A PR ready for the adversarial bot loop
Review notes captured back into the context corpus

Antipattern Skipping the human pass because "the bot said LGTM." The bot pass is necessary; it is not sufficient.

Stage 08

Bot review loop.

loop until clean

After the human pass, a separate automated reviewer runs in a tight loop — different model family, structured verdicts, fixes applied, re-runs until the bot has nothing left to say. Loops that don't converge surface to a human.

Engineer owns

Deciding when a finding the bot keeps raising is a tradeoff, not a bug.
Killing the loop when it's chasing its tail.
The "this finding is a feature request, not a regression" call.

Bot reviewer does

Triggers a structured review; reads findings; applies fixes; re-triggers.
Detects loops — same finding twice — and surfaces them.
Writes each iteration to the append-only audit log on the PR.

Inputs

Human-approved PR · bot reviewer with a structured-verdict schema
Repo's prior gate history for context

Outputs

A PR that's been adversarially reviewed past a fixed quality bar
An audit trail attached to the PR, in source control

Antipattern Silently merging when the bot can't make progress. If the loop doesn't converge, that's a signal — not a nuisance. Surface it.

Stage 09

Documentation.

the compounding asset

On most teams docs are a chore. On an AI-native team they're the substrate the next agent runs on. Treat documentation as a living index — kept fresh by the same agents that consume it.

Owner owns

Deciding what's canonical vs. archival, and where it lives.
Approving doc diffs the agent proposes — docs are not free-write.
The taste of the docs: tone, ordering, what gets a diagram.

Agent does

On every merge, drafts diffs to README, ADRs, and the system map.
Detects drift between code and docs; opens PRs to close it.
Maintains the searchable index every other stage reads from.

Inputs

Merged code · ADRs · evidence pack · prior docs
Existing doc tree & ownership map

Outputs

Always-current system map, contracts, runbooks
An indexed corpus that makes every other stage faster

Antipattern A wiki of model-written prose nobody reads. Optimise docs for the next agent and the on-call human — not for length.

Stage 10

Retrospective.

per ticket · not per quarter

The cheapest place to compound is the end of the last ticket. A short, structured retrospective after every meaningful change — not a quarterly ceremony — is what bends the workflow over time.

Engineer owns

Naming what would have made this easier; what to lift into the standard process.
Calling out gaps in plans, evidence, or context the bot missed.
Updating the playbook when the same lesson lands twice.

Agent does

Reads the brief, plan, diff, reviews, and bot-loop log; drafts a structured retro.
Surfaces recurring failure modes across recent retros.
Files proposed changes to the process docs as PRs.

Inputs

The full ticket history · prior retros · process docs
Gate-log entries across the run

Outputs

A short retro — what worked, what didn't, what changes
Zero or more proposed process-doc edits as PRs

Antipattern Saving retros for the end of the quarter. The lesson cools off; the proposal never lands. Run it now, while the run is still warm.

How an AI-native engineering team actually ships software.

Four commitments that hold the workflow together.

Context is the product.

The plan is the contract.

Humans own the seams.

Adversarial gates beat friendly review.

Ten stages, one rhythm.

The day in ten acts.

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Inputs

Outputs

Trust is what survives an adversarial review.

Approved.

Needs revision.

Escalate.

Loop detected.

tail -f gate.log · one JSON line per gate run, committed alongside the code

The rhythms that keep the pipeline honest.

Write the brief.

Status, not standstill.

Append to the log.

Run the retro now.

Measure the right things; ignore the rest.

Cross-reviewer overlap (ω).

Loop convergence time.

Escalation rate.

Lead time, brief to first user.

Eval-set health.

Doc-to-code drift.

If you only do six things this quarter.

Pick one product surface.

Stand up a context corpus.

Rewrite one plan template.

Add a second reviewer to one phase.

Start the audit log on day one.

Retro every ticket, no exceptions.

Want a senior team that already ships this way?

About this doc

Versioning

How an AI-native
engineering team
actually ships software.

Want a senior team that
already ships this way?