There's definitely a way to use Claude code that is token conscious.
I've tried throwing unsupervised agentic software factory workflows against the wall, and they burned through my tokens like nobody's business but didn't produce much.
Supervised, human-in-the-loop process on the other hand is much more productive but doesn't consume nearly as much. Maybe that's why everyone's pushing agentic approaches so much.
The current thinking is automated agents is what turns this from an industry in the tens of billions to a multi trillion dollar one. So yes you are right on the money, agents stimulate demand for this thing they've built.
So then it's kind of like the government. The more incompetent and inefficient it is, the solution is always paying more taxes to hire more of the same incompetent and inefficient government.
There is always a quantity of lubricant that can get any machine moving. Just add so much that you create an all consuming river of lube and watch your thing sail away.
At the enterprise level though, its going to be hard to want to use a service in which costs are not predictable, and keeping those costs under control requires employee training.
To be fair, the cost of software development has always been fairly unpredictable. What may be different is that the cost used to be roughly proportional to man-hours spent, while now the number of agents running in parallel may be less predictable.
> To be fair, the cost of software development has always been fairly unpredictable.
Yes, but in a "oops this is gonna take another two months to finish" kind of way, not the "oops this is the 12th time this month 8 developers have burned $2K in tokens in a single day and no one really knows how it happened" kind of way.
98.6% cache hits doesn't distinguish an efficient workflow from an overly chatty linear agent repeatedly reusing the same context. Plus, it says nothing directly that the process has good useful progress per token.
My experience as well... I've only hit Antrhopic's 5hr threshold a few times, and two of them was within a half hour of the window. Also, all three times I'd already accomplished a LOT.
I tend to work with the agent, and observe what's going on as well as review/test and work through results/changes. I spend a lot more time planning tasks/features than the execution, even using the agent as part of planning and pre-documentation. It works really well. I don't think people burning through the 5hr allotment in under an hour are actually reviewing/QC/QA the results of what they're doing in any meaningful way, and likely producing as much garbage as good (slop).
I'm really curious as to HOW the MS employees were using the agents as much as what they were doing.
I suspect subscription limits are quite a bit higher than the equivalent tokens their dollar cost could purchase. I similarly feel like I can get a lot done with a $20/mo Claude Pro subscriptions, but also can easily spend $10-20/day at API pricing with similar usage.
I've launched an internal demo of Claude Code and Deepseek on the same day and we burned through our monthly allowance for Claude in just over a week, with more than a half of that budget being spent in one day. With DS people are unable to go through that same amount of money in a month, not even close.
With that Claude feels like an expensive toy, while DS is a shovel, purely because developers do not feel like they are eating into a precious resource while using it. Also it does not feel like there is much of a difference in capability between Claude and DS-pro. DS-pro and flash do feel like sonnet/opus and haiku, but flash is still very-very capable.
My experience is, Claude Code burns way more tokens compared to other agents, probably to ensure high levels of perceived quality, which is, most of the times not worth the bloat for the user. The bloat works for Anthropic as an advertisement at the cost of your tokens.
Thus does kind of beg the question: If developers are being laid off because AI is better/faster/cheaper or makes all their people 10x or whatever fig leaf, what happens if the required tooling ends up being more expensive? From the investor’s point of view is the drag of employee costs better or worse than a ballooning expense item?
I suppose if it all works out it'll end up way more expensive than the employees the models displaced ever were. These kinds of technologies usually end up as an oligopoly at best, and those players will have a wide moat by then, and the things these models build will be tweaked such that no other model or human being can realistically work on them anymore, and then they can price gouge everyone to the brink of unprofitability.
I suspect AI would have to get drastically more expensive before it starts looking worse than payroll. If one developer using Claude Code can effectively substitute for 2 developers, you are already coming out ahead at current API pricing assuming very heavy usage, your cost is going to be ~1.5x developer (factoring in beyond salary - benefits, PTO, the other overhead that comes with having employees).
So you're getting 2 for the price of 1.5. Scale that up to 500 devs at a big company and it's a big chunk of change saved on payroll.
Keeping your headcount or hiring humans instead, AI would have to start to cost upwards of $15k/month/developer or more before it costs more than hiring. You're looking at about 4 billion tokens per month before humans start to break even or are cheaper.
There is no profit, expense, revenue. Those don't matter. Only thing that matters is stock price goes up, and laying off makes stock price go up. When laying off make stock price go down, then laying off stop.
Lots of these places measure employee token use with managers having dashboards. It seems like performative code production rather than making anything useful.
> It was part of an effort to get project managers, designers, and other employees to experiment with coding for the first time.
I suspect they weren't as efficient as they could be with token use either. Sounds like they were trying to encourage non-developers to vibe code stuff
I'd argue you have a lot more to worry about with developers as far as token usage goes because they're the ones who know how to rig up these wild workflows where tens of agents simulate an entire software development team. The non-developers are probably going to be sticking more in the realm of iterating via chat.
"incentivize to use as many tokens as possible" = "Upper management knows people dont like change so we are forcing them to come up with ways to use this thing". It does not mean that management will encourage wastefulness in the future, and it also doesnt mean that token usage from now wont be reviewed in the future. Whats to stop them from dinging your performance in november because you wasted a hundred thousand on tokens with nothing to show for it?
I think whats funny is that employees were most likely already covering the cost for these tools because they are useful. Companies didn't believe employees were using these tools and now have forced their usage and no longer have the costs subsidized.
Similarly companies seem to reward high token usage as a sign of someone willing to play ball with AI and again have forced higher costs on themselves for people reward hacking or using tokens out of spite.
There is no world where I can put my company’s data through an external site without their express consent and security sign off. I suspect at most companies there’s zero path for people to have been paying for it themselves.
The way coding agent work is fantastically wasteful. All the megabytes of code are processed over and over and over, sometimes withing just one session.
There are papers describing KV cache precomputation for commonly used documents (e.g. KVLink), but, of course, it's not a priority for model providers: they'd rather sell you more tokens, also they would rather get to AGI/ASI first than optimize usage of existing models...
Microsoft poorly manages token use of most expensive models in a pilot. Then they use that failure to advertise/position their own Github Copilot agents to procurement teams, over the now widely validated Claude Code-based agents.
At least Codex is trying to win validation on merit.
Surely a company as large as Microsoft is actively attempting to build their own models. They couldn't possibly have expected to stake the future of their software development on the conditions of a third party company?
Okay, but what if you're not Microsofts size and don't have and R&D budget large enough to fund development of your own models and tools?
This is a warning to any company, not building their own AI, that AI assisted development could become really expensive really fast and most likely won't pay off. What Microsoft is suggesting is that the current price is to high, but it's still not high enough for e.g. Anthropic to be profitable, or AI coding tools are only as good as the developers using them. So you can't meaningfully do layoffs by replacing the developers with AIs, because the cost is to high.
How does Microsoft plan to fix CoPilot, so that the cost will be so much lower than Claude, that budget overruns won't be a problem for their own customer?
I expect in the next year or so, we'll stop seeing headlines like "Anthropic buys $15b of compute from SpaceX" and we'll start seeing headlines like "Uber's AI department licenses GPT 6.2 as the foundation for their internal model," or something like that.
Smaller companies will have departments that distill larger models into something more specifically manageable and useful for them. At least, that's my personal prediction :)
At one point there were rumours that they'd do that. They also have the rigts to oAI models for a few more years still, so they could always use that but apparently they're also compute starved (like anyone else).
I switched from Anthropic to OpenAI after spending ~$40K in equivalent token costs using Claude over 3 months.
I found Opus 4.7 to be slow and wasteful with token usage. It's shocking how inefficient it is with tasks like bash tool usage and web searching, delegating them to a dozen subagents only to get stuck and never return until you esc and intervene. That, in addition to all of the broken tooling Anthropic built in to limit token usage like the broken monitoring tool made managing Claude a chore. I was happy to pay $200/month for Opus 4.5 when they had more capacity, but 4.7 felt like a huge step back and no longer worth the price and inconvenience.
I remember an OpenAI employee comment on the GPT5.5 release post about how they specifically geared it towards long-horizon tasks and its been a breathe of fresh air in that regard. I have five two-week long sessions going right now and there's been no degradation in performance or efficiency. It's much better at carrying rules/learnings forward even in long-running sessions and grounding/refreshing itself in verified facts when it loses context.
Its funny because in two weeks I've gotten way more done with GPT5.5 with way fewer tokens and way less handholding. I think this goes to show how important tooling and the harness is and how a capable model like Opus 4.7 can be severely handicapped by bad product decisions.
Being able to mange context over long running sessions is a function of the harness, not the model. Are you using Claude Code with GPT5.5? Codex? piclaw? They’ll all have different context management strategies to let you keep going when you would otherwise have filled up context and be forced to stop.
Man, maybe it's time for me to give the verge a subscription. There the only ones actually doing any journalism here and a bunch of AI blogs skimming off the top.
2nd link doesn't work.
That would be a neat tool, to find the original article and see how many levels of AI summary it has gone through, a game of AI telephone!
I had thought about creating something like that for finding comments for articles. For a given article, display links to comments for HN, lobsters, reddit, etc. However, I feel I already waste too much time reading comments. I shouldn't make it easier and more tempting.
There's definitely a way to use Claude code that is token conscious.
I've tried throwing unsupervised agentic software factory workflows against the wall, and they burned through my tokens like nobody's business but didn't produce much.
Supervised, human-in-the-loop process on the other hand is much more productive but doesn't consume nearly as much. Maybe that's why everyone's pushing agentic approaches so much.
The current thinking is automated agents is what turns this from an industry in the tens of billions to a multi trillion dollar one. So yes you are right on the money, agents stimulate demand for this thing they've built.
So then it's kind of like the government. The more incompetent and inefficient it is, the solution is always paying more taxes to hire more of the same incompetent and inefficient government.
There is always a quantity of lubricant that can get any machine moving. Just add so much that you create an all consuming river of lube and watch your thing sail away.
At the enterprise level though, its going to be hard to want to use a service in which costs are not predictable, and keeping those costs under control requires employee training.
To be fair, the cost of software development has always been fairly unpredictable. What may be different is that the cost used to be roughly proportional to man-hours spent, while now the number of agents running in parallel may be less predictable.
> To be fair, the cost of software development has always been fairly unpredictable.
Yes, but in a "oops this is gonna take another two months to finish" kind of way, not the "oops this is the 12th time this month 8 developers have burned $2K in tokens in a single day and no one really knows how it happened" kind of way.
We’re all being given belt-loaded machine guns and tossed on to Planet K. We used to pay for the salaries of soldiers, now we have an Ammo Budget.
There's no fucking training to mitigate a slot machine.
I get 98.6% cache hits on Claude code. Short of drastic arch changes it’s hard to imagine it getting much better.
98.6% cache hits doesn't distinguish an efficient workflow from an overly chatty linear agent repeatedly reusing the same context. Plus, it says nothing directly that the process has good useful progress per token.
We are all going to be graded by (tickets closed / tokens burned) soon enough.
My experience as well... I've only hit Antrhopic's 5hr threshold a few times, and two of them was within a half hour of the window. Also, all three times I'd already accomplished a LOT.
I tend to work with the agent, and observe what's going on as well as review/test and work through results/changes. I spend a lot more time planning tasks/features than the execution, even using the agent as part of planning and pre-documentation. It works really well. I don't think people burning through the 5hr allotment in under an hour are actually reviewing/QC/QA the results of what they're doing in any meaningful way, and likely producing as much garbage as good (slop).
I'm really curious as to HOW the MS employees were using the agents as much as what they were doing.
I suspect subscription limits are quite a bit higher than the equivalent tokens their dollar cost could purchase. I similarly feel like I can get a lot done with a $20/mo Claude Pro subscriptions, but also can easily spend $10-20/day at API pricing with similar usage.
Yep. I get $6k - $8k worth of tokens (at api rates) using the $200 max subscription.
Can verify that I've gotten about $400 worth of tokens from my $20 sub.
Now that sounds like a business I’d like to invest in! When’s that Anthropic IPO anyway?
I don't understand why people are using the API pricing instead of the Pro/Max subscriptions? What am I missing?
Enterprise customers don't get that option. But also if you want a fully custom harness, you also don't get that option.
Anthropic is forcing large enterprises onto api billing instead of subscriptions.
yeah, by using codex
Feels about right.
I've launched an internal demo of Claude Code and Deepseek on the same day and we burned through our monthly allowance for Claude in just over a week, with more than a half of that budget being spent in one day. With DS people are unable to go through that same amount of money in a month, not even close.
With that Claude feels like an expensive toy, while DS is a shovel, purely because developers do not feel like they are eating into a precious resource while using it. Also it does not feel like there is much of a difference in capability between Claude and DS-pro. DS-pro and flash do feel like sonnet/opus and haiku, but flash is still very-very capable.
Considered Gemini?
My experience is, Claude Code burns way more tokens compared to other agents, probably to ensure high levels of perceived quality, which is, most of the times not worth the bloat for the user. The bloat works for Anthropic as an advertisement at the cost of your tokens.
Thus does kind of beg the question: If developers are being laid off because AI is better/faster/cheaper or makes all their people 10x or whatever fig leaf, what happens if the required tooling ends up being more expensive? From the investor’s point of view is the drag of employee costs better or worse than a ballooning expense item?
I suppose if it all works out it'll end up way more expensive than the employees the models displaced ever were. These kinds of technologies usually end up as an oligopoly at best, and those players will have a wide moat by then, and the things these models build will be tweaked such that no other model or human being can realistically work on them anymore, and then they can price gouge everyone to the brink of unprofitability.
At least the models don’t need health insurance, office space, a cafeteria, or have a threat of unionizing.
I suspect AI would have to get drastically more expensive before it starts looking worse than payroll. If one developer using Claude Code can effectively substitute for 2 developers, you are already coming out ahead at current API pricing assuming very heavy usage, your cost is going to be ~1.5x developer (factoring in beyond salary - benefits, PTO, the other overhead that comes with having employees).
So you're getting 2 for the price of 1.5. Scale that up to 500 devs at a big company and it's a big chunk of change saved on payroll.
Keeping your headcount or hiring humans instead, AI would have to start to cost upwards of $15k/month/developer or more before it costs more than hiring. You're looking at about 4 billion tokens per month before humans start to break even or are cheaper.
You're starting from the assumption that its a 2x benefit. That's a massive leap.
There is no profit, expense, revenue. Those don't matter. Only thing that matters is stock price goes up, and laying off makes stock price go up. When laying off make stock price go down, then laying off stop.
I imagine layoffs are also very much "this quarter and next quarter" with regards to investor visibility.
While LLM Opex is "some future quarter" and very easy to co-mingle with other expenses.
Microsoft should host DeepseekV4 internally for its developers. And you're welcome.
This is the smartest solution to do, to self host the model locally on premise.
Lots of these places measure employee token use with managers having dashboards. It seems like performative code production rather than making anything useful.
Speed without judgement always compounds badly.
Well, that's the inevitable outcome of token-maxxing :shrugs:
Cancellation effective June 30. This was a _pilot_ launched in December that accidentally consumed their 2026 yearly target spend on AI!
I expect the r/LocalLLaMA guys to be going nuts about this news.
From the article
> It was part of an effort to get project managers, designers, and other employees to experiment with coding for the first time.
I suspect they weren't as efficient as they could be with token use either. Sounds like they were trying to encourage non-developers to vibe code stuff
I'd argue you have a lot more to worry about with developers as far as token usage goes because they're the ones who know how to rig up these wild workflows where tens of agents simulate an entire software development team. The non-developers are probably going to be sticking more in the realm of iterating via chat.
That's very interesting to reconcile with the fact that not too far, Amazon employees feel incentivized to use as many tokens as possible.
"incentivize to use as many tokens as possible" = "Upper management knows people dont like change so we are forcing them to come up with ways to use this thing". It does not mean that management will encourage wastefulness in the future, and it also doesnt mean that token usage from now wont be reviewed in the future. Whats to stop them from dinging your performance in november because you wasted a hundred thousand on tokens with nothing to show for it?
I think whats funny is that employees were most likely already covering the cost for these tools because they are useful. Companies didn't believe employees were using these tools and now have forced their usage and no longer have the costs subsidized.
Similarly companies seem to reward high token usage as a sign of someone willing to play ball with AI and again have forced higher costs on themselves for people reward hacking or using tokens out of spite.
There is no world where I can put my company’s data through an external site without their express consent and security sign off. I suspect at most companies there’s zero path for people to have been paying for it themselves.
The way coding agent work is fantastically wasteful. All the megabytes of code are processed over and over and over, sometimes withing just one session.
There are papers describing KV cache precomputation for commonly used documents (e.g. KVLink), but, of course, it's not a priority for model providers: they'd rather sell you more tokens, also they would rather get to AGI/ASI first than optimize usage of existing models...
Claude code gets >98% KV cache hits. It’s not reprocessing unless you let the cache go cold (5 minutes, which is annoyingly short).
Microsoft poorly manages token use of most expensive models in a pilot. Then they use that failure to advertise/position their own Github Copilot agents to procurement teams, over the now widely validated Claude Code-based agents.
At least Codex is trying to win validation on merit.
Surely a company as large as Microsoft is actively attempting to build their own models. They couldn't possibly have expected to stake the future of their software development on the conditions of a third party company?
Okay, but what if you're not Microsofts size and don't have and R&D budget large enough to fund development of your own models and tools?
This is a warning to any company, not building their own AI, that AI assisted development could become really expensive really fast and most likely won't pay off. What Microsoft is suggesting is that the current price is to high, but it's still not high enough for e.g. Anthropic to be profitable, or AI coding tools are only as good as the developers using them. So you can't meaningfully do layoffs by replacing the developers with AIs, because the cost is to high.
How does Microsoft plan to fix CoPilot, so that the cost will be so much lower than Claude, that budget overruns won't be a problem for their own customer?
I expect in the next year or so, we'll stop seeing headlines like "Anthropic buys $15b of compute from SpaceX" and we'll start seeing headlines like "Uber's AI department licenses GPT 6.2 as the foundation for their internal model," or something like that.
Smaller companies will have departments that distill larger models into something more specifically manageable and useful for them. At least, that's my personal prediction :)
Curb Your Enthusiasm theme starts playing.
> attempting to build their own models.
At one point there were rumours that they'd do that. They also have the rigts to oAI models for a few more years still, so they could always use that but apparently they're also compute starved (like anyone else).
I think tech companies are doing layoffs partly because they need to cover AI operating expenses.
I switched from Anthropic to OpenAI after spending ~$40K in equivalent token costs using Claude over 3 months.
I found Opus 4.7 to be slow and wasteful with token usage. It's shocking how inefficient it is with tasks like bash tool usage and web searching, delegating them to a dozen subagents only to get stuck and never return until you esc and intervene. That, in addition to all of the broken tooling Anthropic built in to limit token usage like the broken monitoring tool made managing Claude a chore. I was happy to pay $200/month for Opus 4.5 when they had more capacity, but 4.7 felt like a huge step back and no longer worth the price and inconvenience.
I remember an OpenAI employee comment on the GPT5.5 release post about how they specifically geared it towards long-horizon tasks and its been a breathe of fresh air in that regard. I have five two-week long sessions going right now and there's been no degradation in performance or efficiency. It's much better at carrying rules/learnings forward even in long-running sessions and grounding/refreshing itself in verified facts when it loses context.
Its funny because in two weeks I've gotten way more done with GPT5.5 with way fewer tokens and way less handholding. I think this goes to show how important tooling and the harness is and how a capable model like Opus 4.7 can be severely handicapped by bad product decisions.
Being able to mange context over long running sessions is a function of the harness, not the model. Are you using Claude Code with GPT5.5? Codex? piclaw? They’ll all have different context management strategies to let you keep going when you would otherwise have filled up context and be forced to stop.
This is an AI generated summary of a blog post (https://www.thelowdownblog.com/2026/05/microsoft-cancels-int...) which is a summary of an AI generated article (https://blazetrends.com/microsoft-cancels-claude-code-pilot-...) which is a summary of another AI generated article (https://www.themodelwire.com/article/microsoft-starts-cancel...) which is a summary of an article from The Verge (https://www.theverge.com/tech/930447/microsoft-claude-code-d...). I guess it would be better to link the Verge article instead.
The absolute state of the Hacker News main page in 2026. Thank you for taking your time to put it all together.
Man, maybe it's time for me to give the verge a subscription. There the only ones actually doing any journalism here and a bunch of AI blogs skimming off the top.
2nd link doesn't work. That would be a neat tool, to find the original article and see how many levels of AI summary it has gone through, a game of AI telephone!
I had thought about creating something like that for finding comments for articles. For a given article, display links to comments for HN, lobsters, reddit, etc. However, I feel I already waste too much time reading comments. I shouldn't make it easier and more tempting.
My bad. I had trouble finding the original source when I googled for it and grabbed a link. I was originally shown a screenshot of a x.com post.
I emailed dang to politely ask to make the link point to the Verge article since I can't update it.
https://archive.is/WfCta
boy i'm leaving the internet. sun is shining. was a good time here while it lasted.
The artificial centipede.
i swear i'm going to start an amish community and internet where we forbid any technological development past 2019
call me a luddite, i'll be wearing it as a badge of honor
Welp, this is the future we live in now
AI slop ruined a story about AI? This thread is a story about itself.