AI State of Play - Part 1: From Suggest to Supervise

From suggesting code to supervising agents

Sometime in the last twelve months, the question I get asked changed.

It used to be "how do you use AI in your workflow?" - a question with a fairly bounded answer involving autocomplete suggestions and the occasional ChatGPT consultation. Now the question is closer to "when do you intervene?" - and that is a different question entirely.

This is the first post in a five-part series on the state of agentic coding in early 2026. I'll use Claude Code as the primary lens throughout - not because it has won some imaginary market (it hasn't, and the field is moving too fast for that to be a useful frame), but because it makes the shift unusually legible. Terminal-native, repo-aware, primitives rather than products - the things that matter happen at the surface, where you can actually see them.

Before any of that though, this post is about the shift itself. What it is. Why it is a category change rather than a quantitative improvement. And what it does to the role of an engineer.

The two-question test
Three modes of using AI to write code
What is actually new
The role shift
Why Claude Code as the lens
What's coming in this series
An honest take

The two-question test

If you want a fast diagnostic for where someone is in their relationship to AI tooling, ask them which of the following two questions sounds more relevant to their daily work:

Did the AI suggest something useful?
Should I let the AI take this action?

Question one is the framing of an autocomplete user. The AI is a recommender; you are the agent. You read the suggestion, decide if it fits, and either accept or discard it. Your hands are on the keyboard. The work is yours.

Question two is the framing of an agentic user. The AI is the agent; you are the supervisor. It proposes a course of action - sometimes spanning many files, multiple tool calls, and several minutes - and your job is to decide whether to authorise it, redirect it, or interrupt. Your hands are mostly off the keyboard. The work is shared, and the verification surface is different.

The shift between these two questions did not happen all at once and is not yet complete for everyone. But over the last twelve to eighteen months, the centre of gravity has moved decisively from question one to question two for engineers using these tools heavily.

Stack Overflow's 2024 Developer Survey reported 76% of professional developers using or planning to use AI tools in their work, up from 70% the year before. The 2025 survey, published in December 2025, took that headline figure to 84% - but the more interesting number was underneath it: 51% of professional developers now use AI tools daily, while only 14.1% use AI agents daily. The headline diffusion isn't the story anymore. The frequency split is, and the agentic share within it is the early indicator of where the work is actually going.

One other 2025 number is worth flagging up front, because it sets up Part 3 of this series: only 29% of developers said they trust the accuracy of AI output - down 11 percentage points from 2024. Adoption keeps climbing while trust falls. That gap is not a paradox. It is the supervisor problem, and we'll come back to it.

Three modes of using AI to write code

It helps to be precise about what has actually changed. The last six years of AI-in-the-IDE break into three distinct modes, and they differ on more than capability.

Dimension	Tab-completion / autocomplete	Chat-with-AI	Agentic
Representative tool	GitHub Copilot (2021-)	ChatGPT in a browser tab	Claude Code, Cursor agent, Codex
Who initiates	Human (typing triggers it)	Human (asks a question)	Human gives a goal; agent initiates the actions
Unit of work	A line or a hunk	A code block	A task spanning many files and tool calls
Context the AI sees	Surrounding lines + open file	Whatever you paste	The whole repo, plus tools (read, write, run, search, MCP)
Verification surface	Read suggestion before accepting	Read answer before applying	Read the diff, the outputs, the logs - after the fact
Blast radius	A few characters	A few lines	A repo, a commit, a deploy
Who is driving	You	You	The agent, supervised

Look at the last row. That is the substantive change. In modes one and two, the human is the agent and the AI is a peripheral. In mode three, the AI is the agent and the human is the supervisor. The keyboard is no longer where the work happens.

Where the work happens shifts (illustrative)

The chart above is illustrative, not measured - I have not seen a study that puts hard numbers on session-time allocation across these modes, and I'd be suspicious of one that did, because the variance between engineers is enormous. But the directional claim is uncontroversial in conversations with engineers using these tools heavily: the share of an agentic session spent reading and supervising, rather than producing keystrokes, is much larger than it was even two years ago.

What is actually new

What is genuinely new in agentic coding is not that the AI is smarter. It is that the AI is in a loop with the system.

In modes one and two, the AI was outside the system. It produced text; you were the bridge that brought that text into the world. The text could be wrong, but only you could put wrong text into the codebase. The model's blast radius was bounded by your willingness to paste.

In mode three, the AI is inside the loop. It reads files. It runs commands. It searches the codebase. It calls tools - your linter, your test runner, your database, your monitoring system - and it observes the results and adjusts. It can write a file, run the tests, see they failed, read the failure, and try again. It does this autonomously, sometimes for several minutes, occasionally (with the right framing) for several hours.

That feedback loop is what "agentic" actually means. Not "intelligent". Not "autonomous" in the science-fiction sense. Just in the loop, with tools, with state, with the ability to observe consequences and act on them.

Three concrete consequences of being in the loop:

The AI can verify its own work. Run the tests. Type-check the file. Hit the endpoint. This is mundane and enormously consequential. Half the time autocomplete and chat were wrong, the wrongness was the kind a test would catch in a second - but autocomplete couldn't run the test. Agents can, and the difference compounds.
The AI can do tasks rather than productions. "Suggest a function" is a production. "Add caching to this endpoint" is a task - it requires reading the existing code, deciding where caching fits, picking a strategy, writing the code, updating the tests, possibly updating the docs. That's not a longer suggestion. It's a different category of work.
The AI can make decisions you don't see. This is the part to take seriously. When an agent writes 15 files in five minutes, dozens of small decisions get made along the way: how to name a variable, which abstraction to reach for, whether to refactor an adjacent function while it's there. You can read the diff afterwards, but you weren't in the room when those decisions were made. That is unfamiliar to engineers and exactly what we'll spend Part 3 of this series on.

The third consequence is the one I'd most like readers to sit with. It is the source of most of the genuine difficulty - and most of the genuine power - of working this way.

The role shift

If the work is no longer "write the code" but "decide what code is acceptable, then verify it lands cleanly", the day-to-day skills of engineering shift in subtle but real ways.

The skills that mattered most when I started programming professionally - typing speed, fluency in a particular language's idioms, quickly recalling APIs - matter less, in a measurable way, than they did. The skills that matter more are different: the ability to specify a task precisely, to read a diff with adversarial energy, to recognise when an agent is confidently wrong, to know which tests will catch which classes of bug, to keep a feedback loop tight.

This is not "engineering becomes management". Managing a human engineer is different from supervising an agent. Humans push back, ask questions, take ownership, develop judgment over years. Agents do none of that yet. Supervising an agent is closer to operating a powerful but uncalibrated tool - more like supervising a CNC machine on a complicated cut than supervising a junior engineer on a feature.

The closest analogy I have found, after a year of working this way, is to remind myself that the agent is a very fast, very widely-read, very confident contributor with no skin in the game. It will not tell you that the change you asked for is a bad idea. It will not push back when you specify the task badly. It will produce a plausible-looking output for almost anything you ask. The work of being a good supervisor is the work of providing the friction the agent itself does not generate.

Some of the engineers I've watched adapt to this shift fastest are not the ones with the most years on their CV. They are the ones with strong opinions about code, the patience to write things down, and the discipline to read carefully. Those skills were always valuable. They are now disproportionately valuable.

Why Claude Code as the lens

Claude Code isn't the only agentic coding tool, and this series isn't a vendor pitch. But I'll spend the next four posts using it as the primary lens for two specific reasons.

The first is transparency of surface area. Claude Code is unusual in that the things it does are visible at the level of files and configuration. Skills are markdown files. Hooks are shell commands. MCP servers are processes. Plan mode is a flag. Subagents are configurations. Most of what you can do, you can read. This makes it possible to write about the surface area honestly - and for readers to verify what I say independently, in their own terminals.

The second is primitives over products. Cursor, Copilot, Codex, Windsurf, and others have built rich product experiences on top of similar underlying capabilities. They are excellent in their own ways. But polished product UI tends to obscure what is happening underneath. When the goal is to understand the shift - to understand what an agentic system actually is and how to use one well - having a tool whose primitives are exposed makes the writing easier and the reading more durable. CC's design choice to expose primitives is not the same thing as it being the best tool for everyone, and I'll be careful not to conflate the two.

If you use a different tool, almost everything in this series should translate. The vocabulary differs (skills / rules / instructions; hooks / scripts / automations; MCP / extensions; plan mode / preview mode), but the shape is the same. Where the shape diverges, I'll say so.

What's coming in this series

Five posts, posted roughly every few days:

From Suggest to Supervise (this post) - the shift itself, and why it's a category change rather than better autocomplete.
The Surface Area - skills, hooks, MCP, plan mode, subagents - what they are and how they actually get used in real workflows. Not a feature tour. Three real workflows, with the primitives surfacing where they matter.
Failure Modes and Trust Calibration - where agents go wrong, why, and how to build a verification surface that catches it before it ships. The honesty post.
Docs as the Source of Truth - PRDs, ADRs, design docs, and tests as the substrate that makes agentic engineering work. A single source of truth feeding both the implementation agent and the QA agent.
Toward Autonomy - observability MCPs, the detect-fix-resolve-document loop, and an honest assessment of where we actually are versus where the demos suggest we are.

The newsletter form at the bottom of the blog will let you know when each one goes live.

An honest take

The hype around AI in software development is enormous and substantially wrong. It is wrong in both directions.

The maximalist version - the one that says senior engineers will be obsolete by 2027, that whole departments will be replaced, that the only sensible career move is to leave the industry - is wrong because it consistently underestimates how much of software engineering is not writing code. It is talking to product managers, understanding what the business actually needs, navigating ambiguous requirements, deciding what not to build, pushing back on bad ideas, mentoring junior engineers, being on call at 3am with the right context to resolve an outage. Agents are nowhere near doing those things, and the gap is not closing as fast as the demos suggest.

The minimalist version - the one that says nothing has really changed, that LLMs are just very good autocomplete, that you can ignore this and continue as you were - is wrong because something has really changed. The mode of work is different. Engineers who do not adjust to it will, on a long enough horizon, be at a meaningful disadvantage to those who do. Not because they will be replaced by the agent, but because they will be out-shipped by the engineers who supervise one well.

The honest take, somewhere between those two, is that we are in the middle of a real shift in the craft. It is bigger than version control was, smaller than the move from punched cards to the screen. It rewards the engineers who pay attention to it carefully, work with the tools enough to develop intuition for their failure modes, and resist both the cult and the dismissal.

The next four posts are an attempt to pay that attention out loud.

Sources and further reading:

Stack Overflow Developer Survey 2024 - AI - 76% of developers using or planning to use AI tools; 62% currently using.
Stack Overflow press release: 2024 - the AI use vs. trust gap
Stack Overflow Developer Survey 2025 - AI - 84% using or planning to use AI; 51% daily AI use; 14.1% daily AI agent use; 29% trust AI output (down 11pp from 2024).
Stack Overflow blog: 2025 results - "willing but reluctant"
Stack Overflow press release: 2025 - trust in AI at an all-time low
Anthropic - Claude Code product page and Claude Code documentation.
Simon Willison's Weblog:
- "I think 'agent' may finally have a widely enough agreed upon definition to be useful jargon now" - the cleanest working definition of an LLM agent I've seen: a model that runs tools in a loop to achieve a goal.
- "Agentic Coding: The Future of Software Development with Agents" (Jun 2025).
- "Designing agentic loops" (Sep 2025).
GitHub Octoverse 2025 and the headline blog post - 80% of new GitHub developers use Copilot in their first week; the Copilot coding agent authored over 1 million pull requests in five months; 1.1M public repositories now use an LLM SDK (+178% YoY).

Table of contents