PR cycle-time cut across a 120-engineer org

Context

Good tools, stalled gains.

A 120-engineer platform org had every AI coding tool licensed and a flat adoption curve to show for it. Throughput was up; review time, churn, and rework were up faster. The plateau wasn't the model — it was how the team worked around it.

They didn't need another tool. They needed the conventions, reviews, and evals that turn raw assistance into shipped, trusted change.

What we did

Installed the operating model, not a tool.

Over six weeks we installed four pillars and an eval loop the team could run without us. We set conventions for how AI-authored change is scoped and reviewed, tightened the review pipeline so machine-written diffs get human judgment where it counts, stood up an eval harness that scores output against the team's own bar, and trained the leads to own and tune all three.

Conventions for prompting, scoping, and PR hygiene — written down, not tribal
A review pipeline that puts human attention on risk, not on boilerplate
An eval harness scoring real output against the team's quality bar
Lead enablement, so the model keeps improving after we leave

"Nothing about our stack changed. Everything about our cycle time did."

The install

Four pillars, one eval loop.

// the operating model we installed

conventions · how AI-authored change is scoped

review pipeline · human judgment where it counts

eval harness · scored against the team bar← tune thresholds

lead enablement · the team owns it after handover

The eval loop is the part that compounds. Once the team can measure output against its own bar, every tool decision after we leave is an experiment with a scoreboard — not a guess.

Outcomes

What changed.

38%

reduction in median PR cycle time

new tools or headcount added

120

engineers on the installed model

PR cycle-time cut across a 120-engineer org.

Good tools, stalled gains.

Installed the operating model, not a tool.

Four pillars, one eval loop.

What changed.

Want this installed in your org?