The third-layer story has so far centered one expensive person: the forward-deployed engineer who sits inside a customer's operation, learns how the work actually runs, and ships software against it. That person is rare, and the rarity sets the price. The next move in this market is that agents themselves do most of the forward-deployed work, observing the process, mapping it, drafting the redesign, writing back into the systems, and watching the result in production, while the human contracts to the part that cannot be delegated. The timeline compression and the scaling come from the agent doing the deployment, not from finding more rare people to hire.
What the third-layer model assumed
The forward-deployed model emerged because shipping software into a live operation requires knowledge that does not exist in any document. Someone has to watch the work, ask why a step is done a particular way, and translate that into something a system can execute. flowscope made this case in the third layer of consulting: the durable value sits between the strategy deck and the off-the-shelf tool, in the engineer who is present where the work happens. The cost structure of that model is set by the supply of people who can do it well. Each engagement consumes one of them for its duration, so the firm scales with hiring, and that supply is the constraint hiring cannot quickly relieve.
What agents can reliably do now
The relevant question is not whether an agent can perform a single step, but how long a task it can complete end to end without a human catching it when it drifts. METR's measurement gives this a unit: the length of task, measured by how long the work takes a skilled human, that a model can finish at a given success rate. The finding is that this length has been roughly doubling about every seven months, and that the task horizon at a high reliability bar is much shorter than at a fifty-percent success bar. Read literally, that means an agent today can carry the bounded, repeatable parts of a deployment, the data integration, the first-pass mapping, the routine write-back, while the long-horizon work that chains many uncertain steps still falls below the reliability line a production system needs. The capability is real and it is also bounded, which is the shape that lets an agent absorb the repeatable engineering while a human stays on the rest.
The role that does not move to the agent
flowscope has argued a point that sounds opposite, and it needs reconciling rather than quietly dropping. In building a forward-deployed engineering org, the claim was that human judgment is the binding constraint that no amount of model capability relaxes. That claim holds, and it is narrower than it first reads. There are two distinct constraints here. The first is the supply of people who can do the repeatable engineering of a deployment, and agent capability does relax that one. The second is judgment: deciding what a process should become, owning the consequences when an automated action is wrong, and holding the relationship that lets a customer grant access to live systems in the first place. Model capability does not relax that second constraint, and the earlier post was about that constraint specifically. An agent can propose a redesign; it cannot be accountable for one. It can flag an exception; it cannot decide that this particular exception is the kind the business is willing to absorb. So the engineering constraint is relaxed and the judgment constraint is not. The repeatable engineering and the observation move to agent capacity. The exception-judgment, the accountability, and the trust relationship stay human, and they stay scarce.
How a small human core scales its judgment
The interesting consequence is what happens to the human core once the engineering is agent-run. A consulting firm scales its best judgment badly, because the best operator can only sit in one engagement at a time and the next engagement gets whoever is free. The evidence that this changes comes from how generative AI distributes skill at work. Brynjolfsson, Li, and Raymond found, studying customer-support agents, that an AI assistant raised the productivity of less-experienced workers most, by propagating the practices of the most effective ones into the moment of the work. The same mechanism applies one level up. When the small human core encodes its judgment about what to map, what to redesign, and where to draw the line on autonomy, the agents carry that judgment into every engagement at once. One scarce role becomes a small core whose decisions are reused across many agent-run deployments, rather than a headcount line that grows with the customer count.
Why this compresses the timeline
The timeline of a deployment used to be set by how fast a person could observe, document, design, build, and verify, in sequence, in one operation. With agents doing the observation and the repeatable build, those stages run in parallel and continuously rather than within a person's working hours. flowscope made the discovery half of this case in discovery doesn't need three months, and the delivery half in why an agent does discovery and delivery better: the agent that watched the process is the same agent that writes back into it, so nothing is lost in the handoff between a person who learned the work and a different person who builds against it. The compression is not the agent working faster at a human's task. It is the removal of the sequencing and handoff costs that a human-only team cannot avoid.
The honest counter-thesis
The strongest objection is that this understates how often judgment is needed, on the view that production operations are mostly exceptions that present themselves as routine until you look closely. If the long tail of weird cases is the real work, then the agent handles the easy fraction and the human is left handling everything that matters, with no operating leverage gained. There is something to this, and it is why the human core remains essential rather than vestigial. The reply is in the structure, not in a claim that exceptions are rare. The agent does not need to resolve the exception; it needs to surface it, with the context already assembled, to a human whose decision then becomes a rule the agent applies next time. The exception tail is real, and the human core is what handles it, which is why the model is a small human core plus agent capacity rather than an agent that replaced the engineer. What changed is the ratio of agent capacity to scarce human judgment, and that ratio is what determines how far the firm scales.