Switching Capacity: A New Kind of Tired

Last updated: May 2026 17 min read

I lose my patience with AI agents far more often than I do with my toddler. See if you can relate to a typical day: Claude rewrote half the file when I asked for a small change. I corrected it. On the next pass it touched three lines and declared the job done. I corrected that. It overshot the other way. I corrected again. And I was doing this across four other threads at the same time, each running its own version of the same problem.

Somewhere in the middle of that, I realized I couldn’t think anymore. I wasn’t tired in the normal sense. It was something more specific. I had been hitting a particular wall for weeks, and I didn’t have a name for it. Recently, one of our engineering leaders put it into words for me in a team meeting: “I’ve never maxed out my switching capacity until now.” He had been running five to seven AI agent threads at the same time across different parts of the business, and he kept hitting the same wall of mental exhaustion I had been hitting. Keep in mind, this is a very accomplished engineer with a long career. This was not a skill issue.

That phrase, switching capacity, names something the existing literature on knowledge work hasn’t caught up to yet. We’ve all hit the wall of intellectual capacity, which is how hard you can think about a single problem. It feels like staring at a paragraph and realizing the words have stopped registering. Context switching capacity, which is how many threads of work you can hold in your head and move between without dropping any of them, is the muscle AI agents are hammering, and almost none of us have lived close enough to its limit to know where it sits.

Where Deep Work runs out

Cal Newport opens Deep Work by naming two abilities he thinks are most valuable in the modern economy: the ability to quickly master hard things, and the ability to produce at an elite level in both quality and speed. The premise of the book, and most of his writing, is that the path to both abilities runs through deep work. Defend a block of time, minimize interruptions, let your brain go deep on one problem, and you produce better output. That is still good advice, and I still try to follow it.

I don’t think Newport is wrong about which abilities matter. They are still the right two. The disconnect is that the current zeitgeist has decided you can get to both of them through a completely different path. You tokenmax. You run agents in parallel. You produce volume by orchestrating breadth instead of going deep on any one thing. This is the bet most of the engineers and PMs I know are running right now, myself included on a lot of days. It is not a stupid bet. The speed half of Newport’s second ability is real and visible. You can in fact produce more output by running five threads than by going deep on one for the same number of hours.

What is less clear is the quality half. Newport’s argument was that quality and speed both come from depth, and he puts it in equation form in Deep Work:

High-Quality Work Produced = (Time Spent) x (Intensity of Focus).

Agents seem to deliver the speed without the depth, and I think we are all collectively about to find out whether the quality holds up at that pace. Will model improvements raise the quality bar enough? Maybe, maybe not.

Even if the quality holds up, the cognitive cost question is still open. Running five agents in parallel is closer to conducting (or, let’s be honest, middle management) than to deep work. You spend most of your time tracking state across all of them, returning decisions one after another, holding multiple lines of reasoning open at once. The brain handles depth reasonably well, and we have decades of research on that. The brain handles parallel orientation considerably less well, and AI agents are now pushing all of us into territory none of us has prior experience with.

Newport followed Deep Work up in 2024 with Slow Productivity, which makes the case more pointedly. His three principles are: do fewer things, work at a natural pace, and obsess over quality. The book is essentially a manifesto against the agentic, tokenmaxxing pattern that has taken over since GPT 5.1 and Opus 4.5 came out. I don’t think Newport is wrong about any of it. I think we are all currently inside an experiment running on the opposite premise, and most of us did not consciously sign up for the experiment.

An extension without rhythm

A different book has been more useful to me for understanding what is actually happening. Annie Murphy Paul’s The Extended Mind starts from a different premise than most cognitive science. “We think best,” writes Paul, “when we think with our bodies, our spaces, and our relationships.” The brain evolved to offload cognition rather than to do everything internally. Walking helps you think. Writing things down helps you think. Talking through a problem with someone helps you think. The brain leans on extensions constantly, and most of what we call cognitive performance is really about how well those extensions are composed.

If you accept that framing, AI agents are a new kind of extension, and a powerful one. Most prior extensions, though, had natural pacing built into them. A notebook waits for you to write the next thing. A walk takes as long as it takes. A conversation runs at the other person’s processing speed. The cognitive load comes back to you at a rhythm your nervous system was built for.

Agents have no such rhythm. You hand off a task, and the response comes back in two minutes, or thirty seconds, or however long the model takes. The response is rarely small. It is usually a wall of output you now have to evaluate, decide on, and route. If you have five threads going, five walls might come back at once. The extension is asymmetric. You give a small input and get back a large output that you now have to supervise.

Contrast that with managing a human employee. You’ll delegate a task and maybe get peppered with a few questions in the first couple of hours, and then they’ll go off and work for a day or two. They also (probably) won’t accidentally delete your hard drive or talk to you in insufferable slop patterns (if I read the word “clean” one more time, so help me).

Now run that pattern across five or seven concurrent threads and your coherence gives out before your individual thinking does. The work itself you can still do. The extension just demands more processing from you than it offloads. We are not used to extensions that bill us back at machine speed. Paul puts it well: “Instead of heedlessly driving the brain like a machine, we’ll think more intelligently when we treat it as the context-sensitive organ it is.” This resonated deeply with me when I read the book several years ago, and it’s even more important now.

We have no precedent for this. Every prior extension of the mind, all the way back to writing and tally marks, came with a human-paced rhythm. A book waits for you to turn the page. A whiteboard does not call after you when you walk away from it. Returning to human management, employees go to lunch. They have weekends. The extension paces itself.

There is no equivalent throttle on agents. There is no cultural memory of what it feels like to extend your mind into something that operates at machine speed across multiple threads at the same time. We do not know what the actual limit is. We do not know what the long-term cost is. We do not even have language yet for the specific kind of tired this produces.

Attention residue, multiplied

Some of what is going on here does have a name. Sophie Leroy coined the term “attention residue” in a 2009 paper to describe the cognitive cost of switching tasks. When you stop working on Task A to start Task B, part of your attention stays stuck on Task A. That residue degrades how well you perform on Task B, sometimes for a long stretch after the switch.

That research was done in a world where the “tasks” were things like emails, reports, and meetings. The residue accumulated, but slowly. You could have a bad afternoon. You could be foggy after a stretch of meetings. The cost was real, but the scale was human.

Agents change both the speed of the switches and the density of the tasks. You jump from a coding thread to a planning thread to a research thread, and each one has live state. Each has an ongoing line of reasoning you were just inside. Each will pull you back the moment the agent returns output. There is no clearing the residue, because the next task pings you before you have finished with the last one. The residue compounds in a way Leroy probably never had reason to study.

For the first time in my career, I am the bottleneck, not the tools. My individual thinking still works fine. The supervision quality across all my threads, on the other hand, collapses faster than any single thread does on its own. Once that supervision goes, everything downstream slows down with it. The throughput problem has shifted from “how fast can I do the work” to “how many open loops can I hold before my judgment starts slipping.”

The texture of exhaustion

It’s hard to describe this problem to anyone who hasn’t yet lived it. People think of it as a volume problem, like running too many tabs in your brain at once. The volume matters, but the texture is worse. Each interaction with an agent is a small judgment call. Was the response good enough? Did it actually do what I asked, or just claim to? Is the explanation it gave accurate, or polished and wrong? You make that call dozens of times an hour, and the agent is confident either way. The confidence is part of what makes it tiring. You can’t use the agent’s tone as a signal, because the tone is steady whether the work is right or not. You have to read every response on the merits.

That back-and-forth feels like trying to explain something obvious to someone for the third time, except the someone in question has access to your codebase, is faster than you, and just confidently committed work in three places you did not ask it to touch. The mix of “thank you” and “what are you doing” is hard to hold in your head at the same time, and you end up holding it many times a day.

Cal Newport, summarizing Roy Baumeister’s research in Deep Work: “Your will is not a manifestation of your character that you can deploy without limit; it’s instead like a muscle that tires.”

Multiply that by five threads and you start to see why the supervision capacity runs out before the work does. Patience is what depletes through the day. Each small disappointment takes a chip out of it, and by mid-afternoon there isn’t much left to draw on.

Tokenmaxxing addiction

Aside from its patience-draining nature, there’s another insidious element to this agent-driven knowledge work loop. A different colleague recently told me that he had run out of tokens for two nights in a row that week and that it felt like withdrawal. He was not being dramatic. He was angry and frustrated and short-tempered, and the thing he was angry about was that he could not load up another thread to work on. He said he wants to live a healthier life than that, and he wanted to flag for the rest of us that this stuff is more addictive than we are giving it credit for.

The reward loop with AI agents is very tight. You ship a prompt, you get back work, you feel productive, you ship another prompt. The dopamine hit is fast and the cost is delayed. That is the standard shape of every compulsive behavior we have a name for, which means we probably should not assume we are immune to it just because the thing we are hooked on happens to be work.

Newport names the structural force underneath this in Deep Work. He calls it the Principle of Least Resistance: “In a business setting, without clear feedback on the impact of various behaviors to the bottom line, we will tend toward behaviors that are easiest in the moment.” Tokenmaxxing is exactly this. The cost is invisible. The behavior is easy. The feedback loop that would correct it runs on a slower clock than the loop that rewards it.

What makes this worse is how the addictive quality compounds the switching-capacity problem. The same parallel-thread habit that exhausts your supervision capacity also feels good while it is happening. You are getting things done. You are seeing output. The harm is invisible until it isn’t. By the time you notice the cost, you have already paid a lot of it.

The 75% rule

After discussing this struggle in our team meeting, our engineering leader offered a framework, a field-tested intuition from someone who has been pushing on his own limits for months. His advice was: find what your maximum is. Three quarters of that should be your steady mean.

The framework is useful because we already accept the same pattern with working hours. There is the 80-hour week you can pull when something is on fire, and there is the 40-hour week you can actually sustain. We know that the 80-hour pace is borrowed time. We know that if you live there for too long you break. Nobody argues about this anymore except Silicon Valley bros who haven’t had enough life experience to know better.

The same shape applies to parallel-thread count. Newport cites an aphorism in Deep Work that fits here: “The more you try to do, the less you actually accomplish.” If your max is seven, your steady mean should be five, maybe four. The constraint to optimize for is sustainability. Peak capacity is a different question, and it is worth pulling when the situation calls for it, just not every day.

The temptation with AI agents is to run at peak constantly. The tools reward it. You can in fact get more done at seven threads than at four. The output is real. But the cost is delayed, and the cost is paid by your future self, who at some point is not going to be able to think anymore.

Where the work divides

Newport’s framework isn’t broken in the agentic era, just incomplete. He defines two categories. Deep work, which he describes as “professional activities performed in a state of distraction-free concentration that push your cognitive capabilities to their limit.” And shallow work, which he describes as “noncognitively demanding, logistical-style tasks, often performed while distracted.”

Almost everything I do in a day fits one of those two buckets. What changes in an agentic workflow is who does which part.

The deep work, in this new shape, is figuring out what to build: defining the problem, choosing the constraints, deciding what good looks like, anticipating where the answer is going to be wrong before you have it. That part is still mine. It still requires the kind of single-task focus Newport was talking about. If anything, agents make this part more valuable, because the further upstream a misframed problem starts, the more parallel work you end up wasting on it. A bad spec spawns five agents executing the bad spec, which requires five threads of supervision, and the cost multiplies.

The shallow work is the execution: touching files, writing boilerplate, plumbing config, running tests, fixing the lint errors, writing the obvious version of the function before the interesting version becomes clear. Most of what an agent does for me, when it works, is shallow work performed at speed. That is the appropriate use of the tool.

The mistake I keep watching myself and others make is using agents for the deep half: trying to get an agent to figure out what we should be building, trying to get an agent to choose the constraints, trying to skip the actual thinking by asking the model to do it for us. That part does not delegate well, partly because the model is happy to produce confident-sounding wrong answers, and partly because the model does not carry the context that would let it know what good looks like in your specific situation.

If the work breaks down this way, the 75% rule from earlier applies on the delegation side, not the defining side. The constraint is how many parallel shallow-work threads you can supervise without your judgment collapsing rather than how much deep work you can do. Run agents at 75% of your supervision ceiling, and protect your deep-work time as the thing that decides what those agents should be working on in the first place.

What I’m going to try

I don’t have a tidy prescription for any of this. I am still figuring out where my own limit sits, and I am suspicious of anyone who claims to have this fully worked out after barely eighteen months (maybe less depending on when you start counting) of having current high-powered agentic models and harnesses available.

The one piece of borrowed advice I am running with is the 75% rule from earlier. Whatever your peak parallel thread count is, treat three quarters of that as your steady state, and reserve peak for the days that actually warrant it. That much I think holds up across most people.

Beyond that, I have a list of things I am going to try, all of them guesses. I do not know if any will hold up.

The most important one, I think, is sharpening the saw on my skills and CLAUDE.md files. Annie Murphy Paul writes in The Extended Mind that “we extend beyond our limits, not by revving our brains like a machine or bulking them up like a muscle—but by strewing our world with rich materials, and by weaving them into our thoughts.” Skills and CLAUDE.md files are how I fill my world with rich materials for the agents I work with. Most of the back-and-forth I described above is downstream of context the agent should have had in the first place. Every time I find myself correcting the same kind of overshoot for the fifth time, the right move is almost certainly to pause, write that lesson into a skill or a project instruction, and run the next iteration with that loaded in. The temptation is to push through, because each individual correction feels small and the infrastructure work feels like a tax. Let enough of these taxes pile up unaddressed and you never find time to write the skill that prevents the next twenty.

The other things I want to try:

Batching agent work into fewer, larger sessions instead of running threads continuously throughout the day.
Building deliberate transitions between threads instead of switching cold.
Some kind of physical reset between batches, since Annie Murphy Paul would say the body has more to do with this than I have been treating it.

If you are reading this and you have been figuring out your own way through, I want to hear what you have tried. We are all writing the first draft of this together. There is no manual. There is no body of research to point at. There is only the practical knowledge that colleagues and friends are accumulating one experiment at a time. We are going to need each other’s notes.