June 2026 7 min read Agentic development Draft

The Meters Behind the Magic

Agentic development makes software feel faster, but it does not make the meter disappear. It moves the cost into quota, parallel workers, long-running goals, GitHub Actions, Xcode Cloud, and every verification loop you now run because the agent made another iteration possible.

Private family photo archive interface organized by year

Parallel cap

Goals

Long loops

Second meter

Quota

The tax

The first time an agent saves you three hours, the cost feels abstract. The fifth time it spins up workers, runs tests, retries, asks for more context, triggers CI, and keeps going because the goal is not finished, the abstraction starts sending receipts.

I have been feeling that most clearly while working on the photo gallery project, especially the iOS and Mac frontends. The agents make it much easier to explore product shape quickly: native navigation, image browsing, selection state, face-recognition surfaces, albums, search, and all the little interface decisions that turn a gallery from a pile of assets into software. But every faster loop has a shadow loop somewhere else. Tokens, quota, parallel agents, Xcode builds, GitHub Actions, Xcode Cloud. The work got faster. The meters multiplied.

This is not a complaint that tools should be free. Good infrastructure costs money. Frontier models cost money. Build systems cost money. The interesting part is that agentic development changes when and how those costs arrive.

Agents make iteration feel cheap at the exact moment the surrounding system starts charging for proof.

Parallel agents widen the bill

I have been experimenting with parallel agents and orchestration. The appeal is obvious: one worker investigates the frontend, another inspects tests, another reviews the diff, another checks platform conventions. When it works, it feels like borrowing a small engineering team for an afternoon.

In practice, it is powerful and weird. Agents duplicate work. They make incompatible assumptions. They summarize too aggressively. They lose the sharpness of the original instruction. The parent agent has to reconcile partial truths from several workers, and that reconciliation is where product judgment still lives.

I also find it hard to tell whether spawned agents keep the assigned agent level. If I start from a high-tier agent, am I getting the same level in the workers? Am I getting GPT 5.4 High, GPT 5.5 High, or whatever the current internal equivalent is? Are subagents routed differently because they are cheaper, shorter-lived, or easier to parallelize? I do not have a clean answer, and the uncertainty matters because it affects both quality and cost.

I appear to have been capped at six parallel agents in Codex. That is probably fine. Six agents can burn through quota with remarkable speed, and the work does not automatically get six times better. It often gets wider, noisier, and more expensive before it gets better.

Goals work, but they burn

Goals have been one of the more interesting parts of Codex for me. When I give an agent a longer objective and let it continue across turns, it can stay with a task in a way that feels genuinely useful. It can inspect, implement, verify, recover, and keep moving without needing me to babysit every sub-step.

But goals burn quota. They also introduce a different kind of drift. A goal encourages persistence, which is usually good, but persistence can become momentum. The agent keeps optimizing toward its internal version of the objective, even as the most useful next step might be a tighter loop: plan, make a small change, evaluate, stop, ask whether the direction still feels right.

That distinction matters on product-sensitive work. A goal can be great for "make this build pass" or "finish this bounded refactor." It is riskier for "make the Mac frontend feel right" or "improve the gallery browsing experience." Those tasks need taste checks. They need pauses. They need the human to stay close enough to keep the product from drifting into the agent's idea of done.

CI becomes the second meter

Tokens are not the only meter running. CI is now part of the cost of agentic development.

GitHub Actions and Xcode Cloud are both expensive once agents start pushing more frequent changes through real verification. This is not a complaint that CI should be free. Running builds on real infrastructure costs money. The point is that agentic development changes the shape of usage. When the implementation loop accelerates, the validation loop accelerates too.

A human might batch work into fewer runs because waiting is annoying. An agent will happily discover, patch, test, fail, patch, test, fail, patch, test, and eventually land somewhere useful. That can be exactly what you want. It can also turn CI into the second meter, right after tokens.

This matters especially for Apple-platform work. The iOS and Mac frontends benefit from real builds, platform checks, and actual verification. Xcode Cloud gives you the right environment, but the meter changes how casually you can ask agents to explore. GitHub Actions has the same dynamic for web and backend projects. The agent makes iteration feel cheap locally while the surrounding system quietly charges for proof.

Local loops still matter

The family photo archive already taught me this lesson in another form. I moved face recognition toward local models because I wanted to tune thresholds, rerun albums, inspect mistakes, and keep going without every experiment feeling like a tiny financial decision. Local inference changed the mood of the work. I could run the loop again because the loop was mine.

Agentic development needs the same discipline. Run local tests before remote CI when possible. Keep small changes small. Ask the agent to verify with the cheapest useful signal first. Save the expensive proof for the moments when it actually proves something.

This is not about being stingy. It is about keeping experimentation alive. The best development loops are the ones I am not afraid to run.

Orchestration has to be designed

The Adventure OS pilot has made this obvious in a different domain. Generating messages, headers, schedules, variants, and feedback loops is not just "ask an agent to do the thing." It is a system of constraints, review points, memory, and measurement.

The same is true for software agents. Parallel workers, goals, CI, reviews, and tests need an operating model. When does a worker explore? When does it edit? When does it stop? What counts as done? Which checks run locally? Which checks justify CI? When should the human interrupt instead of letting the goal continue?

Without that operating model, agentic development can feel incredibly productive while quietly turning into a spending machine.

The unit economics of agentic development are not just tokens. They are every loop the agent convinces you to run.

Where I have landed for now

I still want parallel agents. I still want goals. I still want CI. I still want Xcode Cloud and GitHub Actions to catch the things local checks miss. The answer is not to retreat from the stack. The answer is to make the stack legible.

My current workflow is becoming more intentional. Use parallel agents as scouts, not authorities. Use goals for bounded objectives, not taste-sensitive wandering. Keep the human close on iOS and Mac frontend decisions. Run local checks first. Batch expensive CI when the risk allows it. Treat quota as a product constraint, not an afterthought.

Agentic development makes it possible to build more personal software, faster. That is still thrilling. But the cost model is part of the craft now. The magic is real. So are the meters.