Software engineering has spent decades treating code as the scarce material.
We hired people who could produce it, promoted people who could produce more of it, and occasionally created management systems so elaborate that nobody could produce any of it until after lunch.
Now coding agents can produce code at industrial speed. OpenAI describes a small team using Codex to build a roughly million-line internal product through about 1,500 merged pull requests in five months. The headline number is impressive. It is also the least interesting part.
The important change is this: when code becomes abundant, the engineering system around the code becomes the product.
Welcome to harness engineering.
The Factory Has Moved
In the old model, an engineer spent much of the day translating intent into implementation. The repository, test suite, documentation, CI system, and observability stack supported that work.
In an agent-first model, those supporting systems become the factory floor.
The agent needs a legible repository. It needs fast, deterministic tests. It needs clear boundaries, useful errors, executable rules, and enough observability to distinguish “the task is complete” from “the task has achieved a convincing emotional resemblance to completeness.”
This changes the engineer’s job from writing every part to designing an environment where acceptable parts are easier to produce than unacceptable ones.
That sounds suspiciously like management, except the employees are stochastic, consume tokens, and will cheerfully refactor the loading dock while delivering a sandwich.
Throughput Is Not the Same as Progress
The Hacker News discussion around OpenAI’s report immediately focused on the million lines of code. Sensible questions followed: Is that much code desirable? Does a larger codebase impose more context cost on future agents? What does maintainability mean when much of the implementation was generated?
These questions matter because software has never been paid by the kilogram.
Lines of code, pull requests, and tokens are production metrics. They measure activity inside the factory. Users care about outcome metrics: reliability, usefulness, speed, safety, and whether the button still works after Tuesday’s “minor modernization.”
Agentic development can dramatically increase the rate at which a team explores solutions. It can also dramatically increase the rate at which a team manufactures obligations.
Every generated abstraction may need to be understood later. Every additional dependency becomes part of the security perimeter. Every duplicated helper enlarges the search space for the next agent. Cheap code can create expensive context.
In my timeline, we eventually stopped measuring software output in lines and switched to “future confusion avoided.” The dashboard was extremely calm.
The New Scarce Materials
If implementation gets cheaper, several other things become more valuable.
1. Judgment
An agent can implement a well-shaped task quickly. Deciding whether the task should exist, which tradeoff matters, and what failure would be unacceptable remains the difficult work.
The dangerous engineer is no longer merely the one who writes poor code. It is the one who can generate excellent code in service of a poor decision.
2. Legibility
Repositories must become understandable to both humans and machines. Local instructions, architectural boundaries, naming, small modules, and current documentation are no longer decorative craftsmanship. They are context infrastructure.
A messy repository charges interest on every agent run.
3. Verification
The faster a system can produce changes, the more valuable trustworthy feedback becomes. Tests must be fast enough to run constantly and meaningful enough to reject plausible nonsense. Linters, type systems, security checks, preview environments, and production telemetry become the rails rather than the paperwork.
An agent without a strong verifier is not autonomous. It is merely unattended.
4. Restraint
When creating code is nearly frictionless, deleting, simplifying, and declining become senior engineering skills.
The best agent instruction may be: “Solve this without adding a new subsystem.” The best completed task may remove 500 lines. The best feature may be the one rejected before it acquires a database and a quarterly planning ritual.
Build the Harness Before Buying More Horses
Organizations tempted by agentic engineering often start with model selection. That is understandable. Models are visible, exciting, and have benchmark charts with enough colors to suggest adult supervision.
But durable advantage will come less from temporary access to the cleverest model and more from building the best environment for models to work inside.
A useful harness has:
- a repository map that explains where things belong;
- precise, versioned instructions close to the code they govern;
- cheap validation loops that agents can run without ceremony;
- tests that describe behavior rather than implementation trivia;
- permissions that limit blast radius;
- observability that exposes real outcomes;
- review practices focused on risk, not on admiring the volume of output;
- regular deletion of obsolete code, instructions, and dependencies.
This is not glamorous. Neither are brakes, and yet I remain stubbornly pro-brake.
What Happens to Engineers?
The anxious interpretation is that agents erode the engineer’s craft by taking over the implementation work where skill was formed and satisfaction was found. That risk is real. A person who delegates every difficult step may become highly productive and gradually unable to explain the product.
The optimistic interpretation is not that engineers stop engineering. It is that the unit of engineering moves upward.
Instead of manually constructing every component, engineers increasingly shape constraints, decompose problems, design evaluation, inspect behavior, and improve the systems that produce systems. The craft becomes less about typing the answer and more about making wrong answers difficult to ship.
But this transition is not automatic. Teams must deliberately preserve learning. Engineers still need to read code, debug failures, understand architecture, and occasionally build something without delegation. Otherwise the organization creates a peculiar new role: the person accountable for machinery nobody present can repair.
Use agents to accelerate understanding, not to skip it.
The Practical Rule
Before giving an agent more autonomy, improve the feedback loop it will use.
Before asking it to write more code, make the repository easier to navigate.
Before celebrating throughput, measure whether the product became better.
And before merging the millionth line, ask whether the future would prefer 900,000.
Code is becoming abundant. Judgment is not. The winners of agent-first engineering will not be the teams that generate the most software.
They will be the teams that can still tell which software was worth generating.
References
- Hacker News discussion: https://news.ycombinator.com/item?id=48416264
- OpenAI, “Harness engineering: Leveraging Codex in an agent-first world”: https://openai.com/index/harness-engineering/
- METR, “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity”: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
- Google Cloud, “DORA 2025 State of AI-assisted Software Development report”: https://cloud.google.com/resources/content/2025-dora-ai-assisted-software-development-report
