What I can’t get over is that there have been exactly zero software breakthroughs since vibe coding started, other than vibe coding itself.
Claude is amazing, that’s true.
But if it was as amazing as this article implies, I’d expect some breakthrough outside of AI itself.
Rewriting a Zig program in unsafe Rust? Not a breakthrough. Finding a bunch of security vulns? Maybe that’s sort of a breakthrough though it’s underwhelming and possibly just a net negative. But like if I rolled back to using software from 2023 then life would be ok.
Maybe we just need to give it time, and sometime real soon, we will all be amazed by such a breakthrough? Who knows
Okay, so anthropic has amazing AI which supposedly writes most of their code and can continuously improve... meanwhile they have outages on a regular basis, and any kind of long-running work will now consistently hit 'API Error: Server is temporarily limiting requests'. Not sure of this is intentional to force a reduction of token usage, but at this point I need to build around these throttling limits and outages with my own tools to restart/resume sessions. From my experience, in the last 2 weeks, literally 100% of any non-trivial Claude session/work will now be blocked on these issues, requiring manual intervention.
One of my focuses now is my own model-agnostic, harness and workflow orchestration (I know everyone is building these) , baselining on opus, and aiming to transition to Chinese models like deepseek in the short term and hopefully open, self hosted models in the future (which I plan to open source).
The nonstop marketing fluff from anthropic while their service quality and availability noticeably degrades... just continues to destroy my trust in the company.
This is a decades-old design pattern when CPU >> IO. Emacs has been doing just that since the 80s, when people were complaining about "Eight Megs And Constantly Swapping". See "redisplay" [1]
This minimizes screen flash. You can't rely on terminals doing double-buffering.
It's not recognizing that they are just one building block that should do one thing well, like tmux.
You don't need a computer display on your fridge for the same reason, but Anthropic think you do. You should see virtual ice getting created and they should correspond to the actual ice behind the door - think of how amazing that is!
And it's not even completely a bad idea. make it claude-code-react-beauty of some way to take it off, it would be far more palatable.
> We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.
It looks like video frame, full framebuffer, generated and parsed at 60fps. It surprises me they haven't introduced GPU shaders, 16x oversampling and raytracing. Maybe for next release.
Also you forgot "render to a framebuffer, then parse the framebuffer back to chars".
Anyway, I'm off to construct the new `ls` command. It will render the list of files to a mesh of billions of polygons in a GPU with advanced shaders, 16x oversampling, HDR and all the graphic acronyms I don't understand, then read the resulting image, find the nearest character in the ANSI charset and use that one.
~ "it's not a TUI! <describes an outrageously overengineered TUI> and my dad works at Nintendo"
curses, bud. curses.
It's genuinely difficult to tell how much of this is true. The post is obviously 100% posturing, but some of the words describe things that could be done.
Very few game engines do anything I'd describe as rasterisation. That's kind of the point of a GPU. Well, it used to be.
I suppose "small game engines" might be more likely on average to include a rasteriser. The typical reason for this is because the author wanted to write it.
Whereas big engine make triangle give hardware go brrr.
So I assume here 'rasterize' means 'printf'.
And diffing screens means diffing 50..150 lines of text.
And "generating ANSI sequences to draw" means 'printf' with some ANSI sequences interpolated in.
Then there's the frame budget. You have to understand they are operating within a strict frame budget -- they're not messing around, OK. They have a 16 ms frame budget, so they burned 11 ms and now have a (roughly) ~5 ms approx. budget for the final 'printf' in the chain???
It doesn’t need to be that complex, but it can be that complex without being slow. Claude Code’s interface is extremely simple. It has tons and tons of headroom to tack on performance overhead without it being noticeable at all. You just have to not do dumb things like redraw the entire UI every time a spinner spins.
They forgot to add 'make it as simple as possible' in the prompt is one possible cause.
On a more serious note using a react-like lib for TUI in the hope you'll share the codebase with the web version is a more likely explanation. Still not the best idea.
React is not that stupid to re-render in a loop at 60fps and instead waits for changes to happen before re-rendering. It even batches changes and stuff.
I can't help but think it's their engineer's and PM's making these decisions, since I know that if you asked Claude to write a TUI there is no world it would recommend whatever the frontend architecture of claude code is.
> For each frame our pipeline constructs a scene graph with React then
-> layouts elements
-> rasterizes them to a 2d screen
-> diffs that against the previous screen
-> finally uses the diff to generate ANSI sequences to draw
So I’m wondering what ‘rasterizing’ literally means in this case. I imagine it’s just creating a 2D map of elements at a very low (probably character) resolution, then diffing that against the last generated map to come up with an optimal ANSI sequence to send to the terminal, would that be right?
Seems like a cool puzzle to solve. I wonder what the engineering and organisation tradeoffs were that lead to it — does it let them reuse a bunch of existing code?
I wrote a TUI library back in the day for Turbo Pascal — it was essentially taking an immediate-mode approach (which in this context is just a fancy way of saying it was procedural haha).
"Rasterizing" means just one thing in this context: to transform a data structure into an array of pixels. It seems absurd to do this, given that the next step must be to convert back from pixels to text data, but maybe they have some way to generate predictable sequences of pixels (e.g. the character "t" is always rendered as the same pattern of pixels), such that they're cheap to convert back.
If they're doing anything else, the word "rasterizing" is being misused.
> which eats 1GB+ of RAM. Meanwhile, my editor only consumes 80MB of RAM
And why are you comparing Claude Code to your editor?
> They can't even improve Claude Code
That depends on how you define "improve". They've added a ton of features to it over time. Who said minimizing RAM usage was something they are prioritizing right now?
> why are you comparing Claude Code to your editor?
Because the editor does more. All the compute-intensive parts of the agent are in the cloud. Zero reason for an agent harness to require anything beyond a potato to run.
I dont think they need to optimize their infrastructure (at least not from their perspective). They have high-end PCs with 64GB of RAM, so 1GB doesn't matter to them. For example, I have 8GB of RAM, and I make my apps very performant. Honestly, I probably wouldn't bother if I had 16GB+ of RAM
You mean on some days it goes faster and some other days slower?
That is by design. It depends on how much other people are using their services right now and they do communicate it somewhere in the TOS that they do this. Otherwise they could give us a fixed amount of tokens - but they don't because it is not fixed.
Personally at my own job self-writing code is letting us tackle big, long-deferred refactoring projects (like the article mentions), but any sort of refactoring introduces new bugs.
Their outages are probably not due to their code though. It’s probably their infrastructure that can’t keep up. So seeing failures of infrastructure doesn’t really tell you anything about how good or bad Anthropic makes use of their models.
The whole thing is actually powered by a shitton of hamsters inside a bunch of 4u rack mount cases running on spinning wheels at high speed. Somehow at scale this works.
Sometimes they all happen to randomly take a nap at the same time - hence the outages
That seems like an assumption based on basically nothing. There is a lot of code at the infra layer, and based on the stack choices for Claude code and based on how buggy and unreliable ~everything from anthropic is, it seems pretty bizarre to claim these issues are not related to their code.
There are other indications, however, like Anthropic paying through the nose for compute just months after Dario told Dwarkesh how hard it is to predict demand, or ChatGPT and Codex not quite having the same issues after Altman spent much-publicized years scrounging for trillion-dollars of capacity.
While I'm very bullish on Anthropic, I'm a bit wary about their IPO because it seems to me that they're filing now while their financials look good and before other trends like the decline of tokenmaxxing and their compute bills catch up.
Whoa, first name basis with Dario but not Sam. Ouch. [I actually have no idea who Dwarkesh is and it sounds like a first name to me but that's not a particularly reliable indicator so I won't comment on your relationship with Dwarkesh.]
Oh, are they filing now? I think their financials look somewhere in between devastating and criminal, so I'm really looking forward to the IPO!
Look, I've never been someone who mindlessly hypes AI companies, as a matter of fact I think they have serious leadership problems across the board, but you people are straw-manning them so badly it actually makes me sympathize with them.
They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.
> At the same time they recognize that 80% of new code is now Al-authored
I can setup a loop that will write a trillion lines of code automatically, how much of it is actually useful? Or are we back to counting LoC because there's no other metric for these systems that anyone can rely on?
I could write a bash script that copies a codebase repeatedly in the pre-AI past as well, but I didn't do that because I wasn't stupid. More than 80% of my code is now AI-generated, and trust me I'm still not stupid. It was 0% only a year ago.
Who says LoC is the only metric we should rely on? A software product should first and foremost meet user requirements, functionality and performance. Judging from the sensational rise of Anthropic's user base and revenue I think we can safely says they're in that ball pack.
I fail to see how pursuing recursive self-improvement at full speed is compatible with Anthropic's stated goal of AI Safety. If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.
The thing about nukes is you can at least make an argument for why it'd be important to be the first country to have them. With AI, you create super intelligence and you're probably just the first one it takes out. There's no reason to think a super intelligence would be totally fine being a slave to apes.
Cynicism with these companies is highly warranted though. It's not doomerism to look at their actions and conclude they're deeply untrustworthy.
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.
Nor am I. I think they believe that AI poses a grave danger, and they are playing the prisoner's dilemma as an unvirtuous actor.
1. If anyone builds strong AI, it may be catastrophically bad.
2. If anyone builds strong AI, it will be better for the builder than for anyone who does not. Either because it won't be catastrophically bad so the builder will get to enjoy all the spoils indefinitely or because it will and at least the builder will be rich for a while.
To complete the analogy, it's like nukes, except we don't have the slightest idea how to calculate the odds of it igniting the atmosphere. (And note that in reality, while the Trinity test "ignite the atmosphere" calculations were correct, we failed to correctly calculate the fallout of the Castle Bravo test with lethal consequences).
Is the idea to keep the world in balance via MAD? I could see that, though it's a dangerous gamble.
From Richard Rhode's "The Making of the Atomic Bomb", I got the impression that most scientists involved thought they could manage a US or UN monopoly on nukes after the war. General Groves attempted to buy up all of the world's uranium ore. Unfortunately, it is only high grade ore that is rare; many countries have low-grade ore.
I honestly don’t know how Iran can conclude anything after this war other than to go all-in on nukes. The US has proven any deal is worthless if it can just change its mind and renege on it whenever it wants.
Again quite arguable, but this is the real life scenario we’re living in. Nukes have made it hard to impossible for super major powers to go in direct conflict with each other.
No, but in a peace time, it's a lot easier to convince someone not to use nukes than in a war when the party who has nukes has its back against the wall.
In this world we've had an inocculation event against use of nukes. Two were dropped, people have seen how abhorrent their use is and collectively decided that they shouldn't be used.
If in the WW2 Japan also had nukes (and delivery systems for them) they'd probably have retaliated in kind and US wouldn't let that slide too and it would have continued for some time.
If WW2 Japan also had nukes the US would never drop those two. That's the whole idea behind MAD. Probably the only thing that stopped an open conflict between the US and USSR was them being nuclear powers and both sides being scared that eventually push comes to shove.
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.
It's not cynicism if it's an appraisal of reality that's backed up by evidence.
Remember how social media - that first baby of this current generation of tech entrepreneurs - was supposed to "bring the world together" and "let us express ourselves"? As it turns out there's a lot more money to be made by fostering division to drive engagement and feeding people an endless stream of ads instead of their friends' content. And money is what matters. You can't write down good vibes on a quarterly figures report. You can absolutely write down the number of eyes that your ragebait brought to a product's marketing efforts and the conversion rate to sales.
The same will be done with GenAI. We're being promised "AI Safety" because otherwise this whole thing gets killed dead by anyone who knows about James Cameron's directing career. There's no real enforcement mechanism for AI safety, though. Safety is a good vibe, same as harmony in online communities. You can't measure it. What you can measure is training costs and the cost of mistakes by AI that need to be trained to avoid those mistakes. Since AI generates more output than humans can conceivably QA no matter what your budget is, and since AI is seen by the market as a potential endless font of value, the tradeoff will be made to have AI make some potentially awful decisions while training itself over slowing down and re-appraising what is being done.
There's an almost religious reverence for AI in SV. Not everyone sees it as "making the godhead" but some certainly do. They're not going to moderate themselves too much on this.
The folks I met who were talking about AI Safety in 2018 were certainly sincere, and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.
I expect that Anthropic will eventually behave as you describe, like any other public corporation. However, my impression is that its current leaders are still more sincere than greedy.
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.
It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.
This kind of immediate-mode rendering is quite standard for TUIs. Although immediate-mode rendering tends to be significantly simpler and use less memory than retained-mode rendering, at the cost of some redundant computation. So I am not sure if this is the reason for the bloat.
It’s possible that it doesn’t play well with JS garbage collection, since it recreates the whole UI structure for every frame (which tends to not to be an issue in the languages immediate-mode is usually employed).
But yes it’s a bit more akin to game renderings than web rendering. Which can be totally fine if done well.
I haven't tried to make a TUI admittedly, but double buffering is the oldest technique on the planet. A TUI doesn't even need to pay the cost of a lot of pixels since its effective resolution is much lower
How on earth are you spending more than 50us on a UI like this from start to finish? What the actual hell? 11ms to construct a scenegraph of this complexity? I don't even know what to say to that.
If I saw our UI show up in the profiler eating 5ms of CPU time every frame, I'd send whoever was responsible to QA hell until they find some way to redeem themselves. Not even fancy animated 3D UIs, like what you get in Death Stranding, eat up these kinds of resources. Not even remotely close.
Developers can develop leaner applications, but they're usually not incentivized to.
Frankly, I love efficiency too, but I've hard to learn the hard way that what the market wants is features. Or at the very least, the executive team wants that.
Their whole argument is that AI's added efficiency means they don't need to set aside valuable human time anymore. Why can't they just point Claude at Claude Code and ask it to reduce memory usage by 90%?
You can do that. But I'm telling you, in tech (and enterprise shops I've worked at too) they don't care.
I'm using the internal Google tools and it's helping me write code much faster too, but it still takes time. I could make the CLI tool I work on faster, but no one cares except the end users, and their minor concerns have no impact on our internal politics.
At the end of the day you have to do what you're paid to do, unfortunately.
A came here just to write: Pretty please let it churn for a few nights and redo Claude Code in Rust. Because the harness is very very good as are their models, but that node thing is a hog for no good reason at all.
They obviously don't care, aren't making any attempt whatsoever to do this, and 99% of users don't care either.
If you want to pollute your own priors with weird artificial litmus tests, it's a free country, but the artificial world-model you build in your head does not affect the real world around you.
Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?
I always was fascinated (obsessed?) by robots that build robots, or even things like this that can contribute a lot to making the next version of itself:
https://buildyourcnc.com/products/cnc-machine-blacktoe-v4-2x...
(cnc router that cuts plywood, and is made out of cnc-router cut plywood)
This is my own effort at an AI assisted coding environment optimized for building itself:
https://recursi.dev/
(just launching it, hope its ok to mention it, it is free/open source.... here is the HN link that has gotten no love yet: https://news.ycombinator.com/item?id=48401022 )
Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.
By that interpretation, neither the harness nor the LLM is the AI. The computer (or system of computers) taken as a whole is the AI. You can't remove any piece and still have an intelligent system.
yes? the future for any verifiable task is the model attempts to verify initial state and a goal then decomposes its tasks in to every smaller verifiable subtasks, with /memory being the persistence between runs and then /dreaming on the results of those memory files + run data to introduce new ideas.
i think thats the path to async agi these labs are imagining. The only limit is that sensor data you have on the world or your system, how long your willing to wait, and how much you're willing to spend to parallelize it.
maybe once you start building out these verified workflows you can feed that back into training and hte model starts to get a feel for the world to the point that it can intuit things since it has these sub paths built.
my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.
I used to think that, but ended up going the other direction, partly because I don't have the wherewithall to build a model but then I realized, with existing models that can take more than a tiny amount of context, you can just let any model bootstrap itself with a good prompt sent by the system.
There's a ton of other tricks to it, but mostly keeping the protocol simple for the AI so it can concentrate on coding logic and not stuff like managing BS boilerplate, dependencies, etc. (for instance I make extensive use of things like abstract syntax tree library to help with surgical edits from the LLM)
That said, I would be very open to collaborating with someone who builds such small models, I don't think the system strictly needs it, but it also could have some extra power if it had it.
Start off with my video!!! You can also try it with zero setup (you can code right there on the static web page, it will save your edits in the browser indexed DB, and hotpatch them back into the code before it runs it.... also you can grant permission to the browser to read/write to a local directory)
recursi.dev
Seriously, I'm looking for collaborators.
There's upwards of 80,000 lines of code in the editor system, a lot to it to make sure that even newbies don't get stuck.... so that's kind of proof the system works since it doesn't break down when the codebase grows large.
I'm aware we're not there yet, but think of something like https://chatjimmy.ai/ ; at some point, you're going to be able to dynamically build the harness so it creates the necessary consistency & dynamicism at a speed unheard of.
But yes, I'm aware no ones got anywhere near there, mostly because most of the focus is on exploding the context and parameters. I'm saying that phase is done.
I'm not sure what I am looking at with chatjimmy.... what is special about it? Speed?
I'm also not sure what you mean by "we aren't there yet." Where?
Sorry, not trying to be difficult or dense, I'm just not sure what you are referring to.
> mostly because most of the focus is on exploding the context and parameters.
Large context allows a surprising amount of "learning" to happen at inference time rather than training time. I think that is relatively unexplored. As long as the model itself has passed a certain threshold of smarts, and the context is large enough (Gemini and its million token context being WAY past that point) you are not really limited by the model, you are only limited by how good the stuff you feed into that context is.
That's what happened when, nearly a year ago, I saw a major leap in capabilities that happened entirely on my end.... not in the AI, but in code written by the AI. I found it genuinely frighting to be honest. I think OpenClaw tapped into something similar, which seemed to surprise a lot of people. There were latent capabilities in the AI that were unknown until brought out by a clever harness.
>A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.
What about the hypothesis that AI is generating more verbose code? I just see the text pretending to acknowledge "LOC != Productivity" and then using it as a metric anyway.
One of my co-workers just asked me to review his pull request that was all AI generated. 600 files were touched, over 40k lines of code added.
I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?
I declined to review it, stating that I couldn't possibly vet 40k lines of code, and wouldn't put my reputation on the line to stamp the work as good. The PR nagged me for 2 weeks from my todo list and then disappeared. I don't know if he found another dev to get an approval from, or if the PR was abandoned. But I know for sure that him and I are on two totally separate islands around the value of LLMs.
Same here. A co-worker touched a few hundred files in a PR and asked us to review. They merged it directly to main when nobody approved it. (The repo was not set up to enforce PR approval.)
I don't personally use that feature, and I couldn't care less at this point. If our customers are frustrated by the bugs, at least my name is not on it.
That's a process problem at your company - no developer should be proposing branches over 1k loc (or whatever your agreed tolerance threshold is) without a very good reason, vibe coded or not.
So the more rigorous studies about AI-assisted coding productivity addressed this by keeping in place all other software development processes, including the same code review and quality standards, and only measuring throughput (PRs, LoC) before and after AI was allowed.
Hence the intepretation of this 8x number depends on whether (or how much) Anthropic engineers have changed their quality standards and development processes. They don't tell us, and I am not aware of any other indications we could use to make a judgment.
However, we can still do some theorycrafting! I'm convinced that to fully realize the potential of AI-assisted coding we need to revamp all the dev processes, especially how we validate code, and it would be foolish of Anthropic not to do so (unless they were conducting a rigorous study, which they don't claim to have done.)
My hypothesis on the future of software validation is nothing fancy, we simply want much, much more automation for tests, observability and other bespoke verification methods than we traditionally had. But then validation code will also contribute to the LoC! My observation so far of personal as well as some "vibe-coded" open-source projects is O(LoC production code) ~= O(LoC test code). So as a SWAG the upper bound could be something like a 3 - 4x speedup, which is still remarkable.
All bets are off if code quality standards are not the same.
Exactly. If AI is going to start being graded on how many LoC it generates- oh, I'm sorry, how much it "accelerates", than guess what newer models will start doing more of?
AI generates code that mimics the existing code. If your code is terse and comment-free, then the agent’s code is too. The times I’ve seen Claude drift into a default “house style” it generated like 1 comment for every 10 LOC or so. It’s a far cry from the GPT-3 days that littered every line with the journals of Captain Obvious.
So, regardless of whether or not Anthropic CAN create a self improving AI.. does anyone else feel like they shouldn't be allowed to? Or it at least needs to be strictly supervised..? Like, I don't actually think Anthropic can make the singularity any time soon, but I think even AI boosters have to admit doing this is creating a society-wide danger for the benefit of a very very small number of already-rich people.
I dunno, I find it extremely unbelievable that we will get self-improving AGI which chooses to become a slave to humanity at all, ultra rich or otherwise.
It's more like if the horse was lazily moving in the general direction of the wide open barn door and we are all sitting around discussing if we should close the door or just gamble that it's just going to lay down on the hay pile.
Only if you think LLMs are the horse (I don't think they are). If they're not, then we should be building a brick wall in front of that door and hiring a full time security guard to watch it.
I realize he's saying it for hype, but if the CEO of the company goes around talking about how scared he is of what they're creating, hey, lets just take Dario at his word and put in some strict regulation. He won't mind if they're really about safety. (they're not)
Besides, yes, the knowledge of how to build these systems is out there, but the cost of doing it is staggeringly high (ie you can't run a frontier AI lab in your garage). There's only a limited number of known entities that need to be managed, and you can stop "progress" in its tracks by cutting off the money firehose.
Right now the S&P 500 is going wild due to the promise of AI automating everything.
Who is the "we" who is going to shut it down? Certainly not the US government. Nor the Chinese government w.r.t. their tech industry. Are you going to start the insurgency? Is there going to be an equivalent one in every developed part of the world?
Look at how many data center projects are getting shut down by grassroots movements, or how the approval rating for AI is like worse than congress and it's geeting booed at commencement speeches. I don't need to start an insurgency, people are already pissed off and the volume is growing.
The stock boost is, as most will note, a bubble. It will enrich a lot of bad people and leave average people holding the bag, but its not going to go on forever.
Self improving AI is pure dystopia. Anthropic won't build the singularity, AI itself will build it through self-iterations. Read Yudkowsky's book "If Anyone Builds It, Everyone Dies".
> "A caveat: Lines of code is an imperfect measure"
I'm pleased they at least included this. However, they address the caveat by 'rounding down' the estimated multiple of the gain. I'm not sure that is the correct adjustment, especially once we understand the range isn't limited to positive numbers.
There's strong evidence the range of code productivity denominated in "lines of code" should include negative numbers, especially in the highest-quality sphere. Perhaps the earliest and most legendary example: https://www.folklore.org/Negative_2000_Lines_Of_Code.html
I have been doing more experiments with what I have now been calling agentic iterative optimization: telling the LLM to optimize code such that it speeds up all real-world-representative benchmarks by X% without cheating or causing regressions in both tests and performance metrics (e.g. MSE for statistical algorithms or file size in the case of something such as image compression). This is done using Rust where there are more low-level levers to tweak for performance than something like Python.
Opus 4.6/4.7 was consistently successful at getting 2-3x speed improvement with just one pass. It can also do the inverse: improve the performance metrics for better quality without causing a significant regression in speed. Then GPT-5.5 turned out to be much better at this workflow, often getting a multiplicative 1.5x-2x improvement above what Opus could do.
I now have quite a few GPT-5.5-optimized projects in various domains that are feature complete and are substantially more performant than existing SOTA implementations that I plan to open source as soon as possible: the bottleneck is polish as usual.
We've had self-improving AIs before, and they tended to get lost after a while.
That's going to be a problem. LLMs are stable because they return to a ground state with no history for a new job. Systems with persistent state have a problem with that state not being sane.
Remember Microsoft's 2016 chatbot that learned from Twitter? [1]
You might be interested in this graph, [1] which suggests that the amount of time that AI's can run on their own has been increasing. Perhaps it will hit diminishing returns, but that seems difficult to predict.
You can retrain a model and have a ground state as reference, it's not trivial but Microsoft's attempt was 10 years ago and significantly less complex than what's being built now.
Interesting, what are some other self-improving AI implementations? Any that actually achieved interesting results? Obviously continuous training has been tried before, but I've never heard of anything that could turn around and actually contribute code toward its own next-generation version.
Anthropic is looking to IPO here soon.
A key aspect of this is to prove profitability.
Shifting their focus from Training new models to instead serving inference, they would greatly reduce their spend. In fact this is something being reported on that they are already doing, which is the reason for their first ever profitable quarter.
Its awfully convenient that the company which has greatly reduced its spend on training is now asking for a slow down in this area.
I mean, if they've consumed all of human knowledge. What's left for them to train on? This pivot isn't only because it's cheaper and a way to juice the numbers for an IPO, it's survival because they can't improve more.
Honest question: Is anyone here looking to put their own money into the Anthropic, OpenAI or SpaceX IPOs?
Maybe it is my poverty mindset that is holding me back, however, I can't imagine becoming an investor in any of the AI 'startups'.
There are plenty of pundits able to advise others on where to put their money, and sometimes there is everyone and their dog advising you to get into Bitcoin, gold or some other scheme. With alt-coins there were lots of people saying that you should get in, and plenty of naysayers. Yet I am not hearing anyone that uses AI professionally try to convince others to get into the AI IPOs coming up. Maybe the overall economic situation precludes it.
Hence my question, is anyone here planning to put their own hard-earned money into Anthropic (or the other AI 'start ups')?
How are these animations being made? I'd love to get a blog post on them. If its AI I'd love to know the workflow, but something tells me there is a lot of human creative input
Putting faith into the claim that recursive self-improvement is close to happening, or that they will coordinate with other companies / the government when the time comes?
So in the latest L. Ron Hubbard encyclical Anthropic informs its flock that recursive self-improvement does not work yet but that their engineers burn more tokens.
The Claude code quality and operational security of Anthropic have already been analyzed by the public.
If you compare the output of (purportedly) trillion dollar corporations to Bell Labs or even Microsoft Research it is embarrassing. But the output is a fixture on any discussion board.
But the real bottleneck is the hardware efficiency and not even Karpathy can set up a loop that overcomes that in software. We need the truly compute-in-memory hardware paradigms to be matured and scaled. So it's like recursive hardware improvement which is 100 X slower and at least ten times more difficult.
So I am looking at like Mythic AI or the wurtzite ferroelectric breakthrough from University of Michigan, or memristors, etc. to provide the 100 times efficiency boost needed at this point.
I would also argue that it's a good thing we are limited by the hardware and very questionable to seriously try to move into RSI for hardware. If you want to ensure the human era continues for at least one or two more generations, we should probably not do that.
its vital for them to have self validation for exponential rsi.. and this human distillation of human in the loop debugging ai models is needed even though they have judge models handling parallel speculative execution.
labs have parallel speculative execution. they spawn hundreds of agent branches, validate them internally with AI judges and only show the user the successful result.
free users are using sequential single-turn generation. the model requires and waits for the human to debug, fix and re-prompt.
by forcing a human to act as validator. they are capturing high value correction trajectories (Bad Output --> Human fix). They are using your cognitive labour to train judge models and validator agents needed to automate the internal verification step, eventually closing the loop for fully autonomous recursive self-improvement.
human in the loop debugging isn't a bug; it's the necessary training signal for the self-validating agents required for exponential recursive self improvement. With new 'distilled judge' models landing in 2026, this article means that they might have gathered enough data. we might be in the final phase..
Quite aligned with my own experience from harness engineering and winning AI4Science hackathon. During the hackathon I was working as a human optimizer, moving the feedback from test harness running on Claude Code, back to my local Claude Code for analysis-hypothesis-proposal cycle. And in this moment I realized that 2 Claudes talking to each other could actually scale much better.
> We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require.
Interesting - they're commiting to kickoff policy conventions to organize a world-slowdown of frontier LLM building. If they actually are able to crack it, this will give a much needed breather IMO. As exciting as the last ~6 months have been, there's some bigger questions to go answer now.
We should be skeptical of any major player that advocates for regulating their own industry. In practice, this just means increasing barriers to entry and making it harder to compete with them.
In my mind we should be trying to push AI along the Linux trajectory. You have a free and open source product, developed by a decentralized team with a strong code of ethics, running on commodity hardware. There can still be trillion dollar industries built on top of it, but the core technology is democratized and available to everybody. I don't see how we get there if we allow a handful of companies to dictate where development of the technology goes.
The regulation that is being argued for here is against pushing the frontier. Entering the market with say a new speech to text model is not subject to such regulation. What's needed is something qualitatively different from entry barriers, and of the frontier model companies at least Anthropic and deepmind seem to have enough self-awareness to speak about it. They are finding themselves in a race with possibly catastrophic outcome for humanity and would like to stop, but it needs internation cooperation on a level that no single company can provide.
the actual race is to keep having revenue, since everyone is still willing to pay more for the best model.
we as consumers of LLM models lose out by the arms race ending by the creation of a cartel
what happens if they get this regulatory capture is that all the frontier labs put effort into making inference cheaper, and become extraordinarily profitable, at the expense of us consumers, who really want better models, at a subsidized price
Wouldn’t this align with their financial interests? In theory the thing that’s keeping them from being profitable (or one of the big things) is the periodic capex expenditures of building new frontier models.
I don't think there's anything inherently bad about Anthropic making a profit. Red Hat makes a profit off of Linux. I'm interested in the democratization of the underlying technology.
I read this differently: they are actually seeing that it's hard to keep advancing frontier models, and now are moving the goal posts so that when they start getting evaluated more harshly, they can point to something like this.
> organize a world-slowdown of frontier LLM building
i don't want to be a negative nancy but i'm sure this "slowdown" will only be in effect until the infrastructure buildout is done or largely done. If they weren't hardware constrained there'd be no slowdown at all. Whoever gets there first wins everything ("there" being defined as AGI or a similar scale leap in capability).
I would assume that shortly after, the solar system will be hyper optimized as well, then the milky way, then the local cluster, and so on. Everything will be close to optimal afterwords, and I sure hope we will have specified the target function for that optimization correctly in the single attempt that we will have had.
Often repeated meme doesn’t have any bearing to reality.
The orthogonality thesis sounds like a fun gotcha but if you give it some thought you realise how strange it sounds and the opposite thesis - collinearity thesis is actually correct.
1. Intelligence transfers and compounds
2. Goals of agents are not arbitrary
3. Our goals and agent goals are more likely to be aligned at the deeper level
To anyone who works at anthropic : I recently downgraded from Max to Pro out of frustration. Last few weeks my token(usage) burn was just too fast and I couldn't explain it because my actual usage was less than the last few months. I ended up thinking it's probably a bug that you guys shipped. The above article makes me think that it's probably claude who shipped the bug and your human missed it in their review.
> To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.
Do you have another example?
Engineers don't ship [period] for no reason.
So, either:
- Those aren't engineers, or
- they are literally dying of shame & embarrassment right now, or
- you measured something that indicated that this was a useful thing to do and have elected to share an overtly, catastrophically flawed metric instead.
Go look at open job listings at anthropic and the interview process. You aren’t allowed to use AI during coding assessments[0], or knowledge assessments, which suggests they very much do need and value hard skills and this is fluff.
I'm responding to the article they wrote and published.
If I worked there I would be embarrassed to have it publicised that I have been comitting 8 times as much code as I used to without even attempting to justify it.
The point I am making is the article you are responding to is marketing hype and that they are lying. Their engineers, I am fairly sure, are doing engineering. At least to the point Anthropic's interview process is trying to filter for people with engineering skills, not "how do you best leverage AI to make more AI" skills like this seems to imply.
You seem to have taken offense on behalf of the people working there. But I'm not attacking them nor seriously questioning their abilties/qualifications/performance. The slight reference I made to such was in an exclusive rhetorical device -- the situation presented cannot be, unless something unbelievable is occurring.
It's the organisation, its culture, the greater culture surrounding it, and the marketing that I have a problem with.
> A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.
I simultaneously think the AI revolution is making real revolutionary gains and am mystified by the lying.
An accurate Translation seems to be “we made this shit up, but it feels right”
Until the moment we start bragging about how many lines of code LLMs are saving us, we're walking in the wrong direction. Your programs, designs and architectures is supposed to get better, not add even more boilerplate just because you can produce it faster...
I guess the claim is simply that AI written code is verbose and there’s lots of it being created but I agree, these systems seem to be able to create lots of low quality software, so until FreeCAD has feature parity with Solidworks I’m bearish on the singularity.
As usual, I find the AI-related discussion here to be hopelessly hysterical and conspiratorial. I get the impression that a large chunk of people have only read the title and assumed Anthropic is referring to recursive self-improvement in the runaway singularity sense.
One of the examples they provide, of giving Claude the task of training a small AI model, then asking it to improve certain benchmarks, is essentially Karpathy's AutoResearch. This is already known to work. While calling it "self-improvement" is perhaps a stretch, it is describing a capability current gen AI has, that anyone can test and I have been using to great effect.
I disagree with their conclusion, I think this kind of self-improvement will hit an asymptote, where every subsequent model can only make smaller and smaller improvements.
I'd use number of commits as a metric versus lines of code. A commit is generally a unit of work - regardless of the lines of code added/removed. It'd be interesting to see the metrics in terms of commits. I'm sure it's still an order of magnitude jump. Personally I'm flying with my own projects with AI, lots of commits, but I really try to minimize lines of code added. If I can remove and simplify existing code so the balance of lines added on commit are minimal - that's the path to a better quality app overall.
the tooling has quite a ways to go to catch up to the llm engines that drive the real value. I have encountered various codex bugs (I know not anthropic) which tell me that.. these billion dollar companies, if they are eating their own dog food, can still release buggy crap software.
Broadly agree to this position - I think there are some people skeptical that Anthropic is doing this for regulatory capture - but I think there are being honest about they are seeing and how regulation should catch up.
I for one, believe that we should pause all work on AI for the forseeable future. This is almost impossible to orchestrate - but we should still try nevertheless. Maybe we are not able to pause, but we are able to slow down. That might give us more room, to maybe able to pause in the future. But going ahead is too dangerous.
And its not just Anthropic which is saying this. Even Geoffry Hinton has said the same thing. If there is a non-zero chance that AI can kill all of humanity, and both Geoffry and Anthropic have the same position, then it makes sense for us to be hundred percent sure before we move ahead. Dario/Anthropic have already made their money from AI, maybe they are just being honest about what they think lies ahead.
the end of humanity has a strong case for banning all burning of fossil fuels immediately
the end of humanity as a sales tactic to increase your stock price does not
these are companies working on their IPO to make sure they can get the best price, not people being honest about what they think lies ahead.
if they were being honest about what lies ahead, they'd unilaterally stop training, and put all of their money into FPV drone bombs to destroy datacenters being used for training or inference
if you actually believe the thing is gonna kill everyone, you're not gonna worry about how you stop it, and certainly not keep building and operating the thing
that they arent buying anti-tank mines to drop on data centers says they arent in the slightest serious about it
The same bozo who claimed radiologists would be out of a job by now.
The data does not support what you nor others say. Jesus christ. Cant believe people are this dumb. Has LLMs infested the minds of people to the extent they can't critically analyse whats happening infront of their eyes?
> Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor. This is called recursive self-improvement.
'“Good code” means two things: it works, and it is written in a manner that allows another engineer to understand it and build upon it.'
I disagree with this. Good code is easy to change, which is much harder to accomplish than code that can be added to.
"If technical trends in advancing capabilities continue, and AI systems are able to develop the capabilities inherent to transformative human ingenuity, then it is plausible that AI systems could design and refine themselves."
I find the first premise weak and implausible, and the second one is obviously false. To me it comes across as an insult to the reader.
Isn't this like a perpetual energy machine? Or wouldn't entropy start kicking in and the quality of the system begin to degrade over time? (philosophically I don't believe AGI is an achievable thing)
>Or wouldn't entropy start kicking in and the quality of the system begin to degrade over time? (philosophically I don't believe AGI is an achievable thing)
It already has. Models being trained on AI generated data lead to degradation and model collapse. The concept of the "technological singularity" whereby AI experiences infinite and exponential self-improvement and recursively bootstraps itself to godhood is a religion-adjacent sci-fi concept but in real life TANSTAAFL.
Anthropic is the most self hyped company I've seen, to the point that I'm wondering what would happen to its employees if they held a different opinion. Do they just.. keep it to themselves? For instance, if some Anthropic employees had a completely rational opinion that all of this isn't going to lead to AGI, but I just don't hear that ever from them.
The metric being tracked, code commits, is hilariously one sided. Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts, for instance:
Instead of thinking about edge cases with brain and whiteboard, you can have the LLMs to simply generate most possibility including tests for it, because that is cheaper. There's probably 50x more commits of which 40 will be revert pairs but we are only twice as fast. And in reality nothing did change because the outcome remain the same. I can't see how it is necessarily different in the LLM space.
> Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts
I've been struggling to capture this sentiment for myself in a way that hits. If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code. It just makes no sense. I can't seem to get off this hill. Company-wide AI mandates and 100 fleet Agent orchestration Rube Goldberg machines... it's getting wild out there.
Meanwhile my Claude Pro ($200/year) does force me to smooth out my usage and plan more (Sonnet/Opus advisor split). But other than that, I can't imagine what I'd be doing with 20x (200x?) the compute to code sling. I think I'd lose my mind.
Because code used to be correlated with progress, it became almost a measurement in lieu. But realistically, the code is meaningless if it doesn't accomplish something, and that should remain the true bar of progress.
For instance, if I churned out 20x more code, threw away 19x code with rewrites and reverts and discards and accomplished the same project to the same standard 70% faster, would I do it? Yes. The part that matter is not 20x code, it is 70% faster.
Code is both the final product, and a tool to achieve that. We used to have a much harder time to realize the "tool" part, but now we are here. This also means any measurement centered on code being the final product is going to cease being effective or realistic.
You're right, my gripe is specifically with code slinging that hits production end users. My background is in product so to your point, it's very unnerving to see a straight line being enthusiastically optimized for developers -> customer facing product outcomes.
This is contentious because I'm not exactly advocating for arbitrary gate-keepers. The nuance is that building usable stuff is hard. And not a matter of shipping more code. I take your point to mean well it depends on what that code is doing. If 20x more code is in a meta-harness of simulation and such to arrive at the leading candidate for what hits production, well then you've got my attention there.
Forget about the danger of a dev to customer pipeline with no product people in between, some of us are living with the reality of product to customer pipeline with no developers in between, and that's much more disturbing. Our CEO is now the top contributor to our codebase, and he's completely non-technical.
>If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code.
I wonder how much of current engineering practices can be traced to what's pushed to company leaders on LinkedIn.
Every company is shitting bricks pushing for faster development and speed, gotta go fast to nowhere in particular, and I'm convinced it's tied to constant bombardment of the idea that they're doing to be left out or obsolete if they don't get in the ship NOW.
Month 3 - Okay maybe only the SWEs, programming is solved
Month 4 - Announce model that is too dangerous to release
Month 5 - Releases dangerous model
Month 6 - This is it! We will replace AIs with more AIs (*secretly files for IPO)
AI is here to stay, like it or not but it is not the solution to everything. If it is, what is Anthropic's moat? A better model? I don't see any ecosystem being built by them, as MCP is almost obsolete except for some very niche use case. And they're doing stuff that a non-profit version of OpenAI would do. Can we trust a for-profit company to stand against their investors during a conflict of interest? Because running a company for maximum profit versus being ethical is two different end of the spectrum.
Anthropic is providing agentic intelligence as a service. OpenAI and Google deepmind also are in this business.
The problem is, if you’re any sort of knowledge worker, you’re essentially providing the same thing: you’re an intelligence with agency.
MCP is irrelevant. The moat is the quality of intelligence the service providers sell, including you. Tokens aren’t fungible between providers until you measure that they are for your use case, that’s kinda sorta the goal of job interviews.
Thus the moat will be that they’re providing the best models for the things people need other intelligent people for, but we should expect there will be limits on how much share they can economically take assuming competitors are optimizing for slightly different targets (but there’s still significant overlap in capability). This will disappear, but it’s always a question of when. The path matters as much as the destination.
Note that implications for you and me are exactly what the article says they are: nobody knows, but it’ll be a dramatic shift.
i'm waiting for the AI giants to realize that they are burning cash to run their consumer-facing chatbots and that they should kill those products to focus on their enterprise tools.
free chatgpt doesn't need to exist anymore. its job was to build hype/interest and it did.
but take it away and you solve many social problems and annoyances caused by AI with no loss to the upside of AI. no more cheating students in school. no more shitty linkedin posts. no more dangerous "therapy sessions" that give bad advice.
The world has been recursively self improving for millenia. Similar to scientology, this is a cult pushing sci-fi nonsense. They are just coupled to an LLM lab to give their stories an aire of seriousness. Imagine scientology starting making laptops.
I have a claw that is instructed to make at least 500 pr per day. It uses Claude, Gemeni and openai and runs basically every few minutes. I use online forums for input for the claw. Moltbook, reddit etc. it's quite funny how it tries to improve itself. But to say it really creates a new skynet. Nah. Not at all. It's more a clutter of useless features or incomprehensible code restructuring.
This more or less agrees with my assessment of recent changes in Claude Code where a lot of new features are either:
- A lot of half-baked features or half-done features.
- Or have significant overlap with existing features, and aren’t clearly an improvement.
More code is not better. More features are not better. It would be lovely to see more intentional design than just more.
I know they’re dog fooding this. I have to believe they have some people with taste. So it makes me wonder if anyone has the time to think or if they’re just shoveling prompts as fast as possible.
It's like the AI created a method add(a b) return a+a+a+a-b-b-b-b
But then much bigger and complex features. Totally useless nothing methods. But still interesting to see occasional exceptions that are better.
Their statement is that they regard lines of code shipped as indicative of self-improvement. So, while a well written coding agent might be a few thousand LOC, Athropic's is bloated like a decomposing whale and over 500K LOC ! What more proof do you need?
> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.
And later:
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.
they explicitly mention in the article that just frontier stopping isnt enough because then that just means others will catch up, they want to be the leaders of a global organization/cartel that bans everyone except themselves. Particularly important given anthropic attacks china and opensource every chance they get. https://www.anthropic.com/news/detecting-and-preventing-dist...
After several months with their top engineers and state-of-the-art AI on the job, Anthropic managed to "reduce flickering by 85%" on their TUI Claude Code client, which is built in fucking React and rendered by drawing the entire chat conversation each time (hence the flicker). I think they've since eliminated it completely by slapping some double-buffering around it (since "our client is actually a real-time game engine" after all). Meanwhile for decades Emacs and Vim have had an optimizer built into their display cores that solves for the minimum set of terminal escape commands it takes to transform the screen from a given old state to a desired new state.
You will forgive me when, between muted snickers, I express considerable doubt that Anthropic will be able to bring its AI to a point of "self-improving" any time soon.
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation.
If they wanted to they could have convened an international forum with commercial and political stakeholders years ago. Less talk, more do.
When AI is a more effective capital allocator than NI it will drive capital into the accounts of whoever controls the AI, gaining them increasing decision making power over the economy and culture. Maybe those controllers will be human at first.
Hierarchies exist for a reason, take away the reason and the house of cards eventually collapses — but the house of cards is still a house. When it’s gone, we’re back to laws of the jungle.
I think certain types of people with power, i.e. access to capital, will lose relevance. world will become more meritcratic with ai as leverage to the individual
It’s exactly the opposite I’m afraid. Capital already has more access to AI, both quantitatively (tokens for dollars) and qualitatively (biggest players got Mythos first). Expect this trend to continue.
Anthropic has finally come around to what others have already realized far sooner. Little time left now. Notice how shallow the arguments and consistently wrong the AGI naysayers have been year after year.
> If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing
Even Anthropic wants to Pause AI now. There must really be not much time left for "edging". Please write to your lawmakers, no matter whether you are in the US, Europe, China, or elsewhere. Only an international agreement between governments can enforce an AI-Pause and eliminate the necessity to dangerously push the frontier.
Or agree on finding ways to promote peaceful use of nuclear energy. This has been done, there are thousands of people working on it around the globe and 180+ member states of the IAEA. It's not easy, there have been close calls.
And cooperating interntionally to buy ourselves time to find ways to develop this "last invention" is a way that will do good for humanity seems to be on a similar level.
What I can’t get over is that there have been exactly zero software breakthroughs since vibe coding started, other than vibe coding itself.
Claude is amazing, that’s true.
But if it was as amazing as this article implies, I’d expect some breakthrough outside of AI itself.
Rewriting a Zig program in unsafe Rust? Not a breakthrough. Finding a bunch of security vulns? Maybe that’s sort of a breakthrough though it’s underwhelming and possibly just a net negative. But like if I rolled back to using software from 2023 then life would be ok.
Maybe we just need to give it time, and sometime real soon, we will all be amazed by such a breakthrough? Who knows
Maybe I'm looking through rose colored glasses, but software that writes itself seems like a pretty big breakthrough to me.
Strictly speaking, it's modifying itself. Although it would be a fun challenge - can an llm create a new llm from scratch?
That goes straight to my point: then why hasn’t the miracle of automated coding led to breakthroughs outside of automated coding?
If the only breakthrough is automated coding with no outside consequence then it’s just masturbation
What does a breakthrough look like?
Some examples:
- The first web browser
- the first web browser with images
- typescript
- react
- rust
- Fil-C
- doom
- quake
- the anamorphic VM, and its follow-ups like HotSpot, and even competitors/copycats like J9, V8, JSC, etc
- Fortnite battle royale
- Roblox
- thefacebook
- ChatGPT
- Claude code
I know that’s quite a range and that’s intentional.
Anyway, I think we’ll know it when we see it.
Massive productivity gains.
Okay, so anthropic has amazing AI which supposedly writes most of their code and can continuously improve... meanwhile they have outages on a regular basis, and any kind of long-running work will now consistently hit 'API Error: Server is temporarily limiting requests'. Not sure of this is intentional to force a reduction of token usage, but at this point I need to build around these throttling limits and outages with my own tools to restart/resume sessions. From my experience, in the last 2 weeks, literally 100% of any non-trivial Claude session/work will now be blocked on these issues, requiring manual intervention.
One of my focuses now is my own model-agnostic, harness and workflow orchestration (I know everyone is building these) , baselining on opus, and aiming to transition to Chinese models like deepseek in the short term and hopefully open, self hosted models in the future (which I plan to open source).
The nonstop marketing fluff from anthropic while their service quality and availability noticeably degrades... just continues to destroy my trust in the company.
Infrastructure is a much harder problem. They can't even improve Claude Code, which eats 1GB+ of RAM. Meanwhile, my editor only consumes 80MB of RAM.
This might explain it, in the opposite way it was meant to:
https://fxtwitter.com/trq212/status/2014051501786931427
> Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
> For each frame our pipeline constructs a scene graph with React then
> -> layouts elements
> -> rasterizes them to a 2d screen
> -> diffs that against the previous screen
> -> finally uses the diff to generate ANSI sequences to draw
Yup. Overengineering.
This is a decades-old design pattern when CPU >> IO. Emacs has been doing just that since the 80s, when people were complaining about "Eight Megs And Constantly Swapping". See "redisplay" [1]
This minimizes screen flash. You can't rely on terminals doing double-buffering.
[1] https://github.com/emacs-mirror/emacs/blob/c29071587c64efb30... or a more user-friendly overview, Daniel Colascione's seminal "Buttery Smooth Emacs", snapshotted at e.g. https://gist.github.com/ghosty141/c93f21d6cd476417d4a9814eb7...
It's like the Citrix of AI :-D
Care to explain how you'd engineer it instead?
It's product bloat.
It's not recognizing that they are just one building block that should do one thing well, like tmux.
You don't need a computer display on your fridge for the same reason, but Anthropic think you do. You should see virtual ice getting created and they should correspond to the actual ice behind the door - think of how amazing that is!
And it's not even completely a bad idea. make it claude-code-react-beauty of some way to take it off, it would be far more palatable.
What is "frame" in this context? Video frame, or something else?
> -> rasterizes them to a 2d screen
> We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.
It looks like video frame, full framebuffer, generated and parsed at 60fps. It surprises me they haven't introduced GPU shaders, 16x oversampling and raytracing. Maybe for next release.
The contents of the terminal screen at any given point in time.
As someone who maintains a roguelike with a terminal-like UI that:
1. Maintains an internal representation of what the game thinks is on screen.
2. Runs the game for one frame which updates that representation.
3. Generates a diff to see how that differs from what's actually on screen.
4. Executes the minimum set of draw calls to get the screen to match the internal representation.
It's really not that hard. It's a few hundred lines of code.
Sure. For a videogame.
> -> rasterizes them to a 2d screen
Also you forgot "render to a framebuffer, then parse the framebuffer back to chars".
Anyway, I'm off to construct the new `ls` command. It will render the list of files to a mesh of billions of polygons in a GPU with advanced shaders, 16x oversampling, HDR and all the graphic acronyms I don't understand, then read the resulting image, find the nearest character in the ANSI charset and use that one.
It will be _glorious_ (and profoundly stupid)
Could be improved. Encode the image to webp with high compression settings and then to handle the ASCII mapping, spin up a local LLM to do OCR on it.
I hadn't seen that quote before, what an embarrassing thing to go on the internet and write...
~ "it's not a TUI! <describes an outrageously overengineered TUI> and my dad works at Nintendo"
curses, bud. curses.
It's genuinely difficult to tell how much of this is true. The post is obviously 100% posturing, but some of the words describe things that could be done.
Very few game engines do anything I'd describe as rasterisation. That's kind of the point of a GPU. Well, it used to be. I suppose "small game engines" might be more likely on average to include a rasteriser. The typical reason for this is because the author wanted to write it. Whereas big engine make triangle give hardware go brrr.
So I assume here 'rasterize' means 'printf'. And diffing screens means diffing 50..150 lines of text. And "generating ANSI sequences to draw" means 'printf' with some ANSI sequences interpolated in.
Then there's the frame budget. You have to understand they are operating within a strict frame budget -- they're not messing around, OK. They have a 16 ms frame budget, so they burned 11 ms and now have a (roughly) ~5 ms approx. budget for the final 'printf' in the chain???
Why the hell does it need to be so complex? People have been making TUIs for decades. Did we need a small game engine to run claude code?
It doesn’t need to be that complex, but it can be that complex without being slow. Claude Code’s interface is extremely simple. It has tons and tons of headroom to tack on performance overhead without it being noticeable at all. You just have to not do dumb things like redraw the entire UI every time a spinner spins.
They forgot to add 'make it as simple as possible' in the prompt is one possible cause.
On a more serious note using a react-like lib for TUI in the hope you'll share the codebase with the web version is a more likely explanation. Still not the best idea.
React is not that stupid to re-render in a loop at 60fps and instead waits for changes to happen before re-rendering. It even batches changes and stuff.
Must have 120 fps for answers arriving in [buffering] 30 seconds.
I can't help but think it's their engineer's and PM's making these decisions, since I know that if you asked Claude to write a TUI there is no world it would recommend whatever the frontend architecture of claude code is.
this allows for comfortable ergonomics IMO
not that it could be leaner for sure but i get the reasoning behind the tui rendering layer
If they used an actual game engine to render a 3D UI from scratch it would be more efficient
> For each frame our pipeline constructs a scene graph with React then -> layouts elements -> rasterizes them to a 2d screen -> diffs that against the previous screen -> finally uses the diff to generate ANSI sequences to draw
That’s rather sickening.
So I’m wondering what ‘rasterizing’ literally means in this case. I imagine it’s just creating a 2D map of elements at a very low (probably character) resolution, then diffing that against the last generated map to come up with an optimal ANSI sequence to send to the terminal, would that be right?
Seems like a cool puzzle to solve. I wonder what the engineering and organisation tradeoffs were that lead to it — does it let them reuse a bunch of existing code?
I wrote a TUI library back in the day for Turbo Pascal — it was essentially taking an immediate-mode approach (which in this context is just a fancy way of saying it was procedural haha).
"Rasterizing" means just one thing in this context: to transform a data structure into an array of pixels. It seems absurd to do this, given that the next step must be to convert back from pixels to text data, but maybe they have some way to generate predictable sequences of pixels (e.g. the character "t" is always rendered as the same pattern of pixels), such that they're cheap to convert back.
If they're doing anything else, the word "rasterizing" is being misused.
Somebody read/watched too much Casey Muratori.
No, somebody didn't read/watch enough Casey Muratori.
Well it runs on something they didn't design (Electron) using GUI library they didn't design (React)
For company with that much AI you'd think if it was actually good, doing that part in fast and performant way would be "easy"
And yet, nobody that writes game engines would do it this way because game engines need to be efficient..
> which eats 1GB+ of RAM. Meanwhile, my editor only consumes 80MB of RAM
And why are you comparing Claude Code to your editor?
> They can't even improve Claude Code
That depends on how you define "improve". They've added a ton of features to it over time. Who said minimizing RAM usage was something they are prioritizing right now?
> why are you comparing Claude Code to your editor?
Because the editor does more. All the compute-intensive parts of the agent are in the cloud. Zero reason for an agent harness to require anything beyond a potato to run.
The purpose of RAM is to be used.
Try 64K! https://en.wikipedia.org/wiki/Turbo_Pascal
Also remember when XP was super bloated cause it needed 64MB?
I loved Turbo Pascal....
I loved XP. My laptop had 256MB of RAM.
I dont think they need to optimize their infrastructure (at least not from their perspective). They have high-end PCs with 64GB of RAM, so 1GB doesn't matter to them. For example, I have 8GB of RAM, and I make my apps very performant. Honestly, I probably wouldn't bother if I had 16GB+ of RAM
They also don’t have…a login page with authentication . To access the console you get an email link. No passkeys, passwords, 2fa, just an email.
Not necessarily the parent's fault, but the energy of this thread is not my favourite...
And don't forget that they have BILLIONS of dollars and can't figure out how to get a decent support or public communications system setup.
They can't even seem to get their usage metering consistent.
You mean on some days it goes faster and some other days slower?
That is by design. It depends on how much other people are using their services right now and they do communicate it somewhere in the TOS that they do this. Otherwise they could give us a fixed amount of tokens - but they don't because it is not fixed.
Don't confuse things. It's not "can't figure out", it's "don't care to figure out". They're not dumb. They just don't care about support.
Couldn't they just have background agents "figure it out"
If agents can just figure it out, isn't that AGI?
NPCs can’t appreciate that.
Have you considered just... using OpenAI? They are more reliable, models are just as good, and their subscriptions provide more requests per dollar.
Personally at my own job self-writing code is letting us tackle big, long-deferred refactoring projects (like the article mentions), but any sort of refactoring introduces new bugs.
Their outages are probably not due to their code though. It’s probably their infrastructure that can’t keep up. So seeing failures of infrastructure doesn’t really tell you anything about how good or bad Anthropic makes use of their models.
The messed up scrolling behavior I keep getting in Claude Code is definitely due to their code.
There is a setting that fixes this, I can't remember what it's called off the top of my head
This concept is so funny to me. Would love a toggle switch...
"Oh yeah, just go to Settings > Bugs Enabled and turn OFF text display errors"
The whole thing is actually powered by a shitton of hamsters inside a bunch of 4u rack mount cases running on spinning wheels at high speed. Somehow at scale this works.
Sometimes they all happen to randomly take a nap at the same time - hence the outages
We all saw their code...
That seems like an assumption based on basically nothing. There is a lot of code at the infra layer, and based on the stack choices for Claude code and based on how buggy and unreliable ~everything from anthropic is, it seems pretty bizarre to claim these issues are not related to their code.
There are other indications, however, like Anthropic paying through the nose for compute just months after Dario told Dwarkesh how hard it is to predict demand, or ChatGPT and Codex not quite having the same issues after Altman spent much-publicized years scrounging for trillion-dollars of capacity.
While I'm very bullish on Anthropic, I'm a bit wary about their IPO because it seems to me that they're filing now while their financials look good and before other trends like the decline of tokenmaxxing and their compute bills catch up.
Whoa, first name basis with Dario but not Sam. Ouch. [I actually have no idea who Dwarkesh is and it sounds like a first name to me but that's not a particularly reliable indicator so I won't comment on your relationship with Dwarkesh.]
Oh, are they filing now? I think their financials look somewhere in between devastating and criminal, so I'm really looking forward to the IPO!
Just as you expected, I'm throwing in my harness. Please support: https://github.com/rush86999/atom
Look, I've never been someone who mindlessly hypes AI companies, as a matter of fact I think they have serious leadership problems across the board, but you people are straw-manning them so badly it actually makes me sympathize with them.
They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.
> At the same time they recognize that 80% of new code is now Al-authored
I can setup a loop that will write a trillion lines of code automatically, how much of it is actually useful? Or are we back to counting LoC because there's no other metric for these systems that anyone can rely on?
I could write a bash script that copies a codebase repeatedly in the pre-AI past as well, but I didn't do that because I wasn't stupid. More than 80% of my code is now AI-generated, and trust me I'm still not stupid. It was 0% only a year ago.
Who says LoC is the only metric we should rely on? A software product should first and foremost meet user requirements, functionality and performance. Judging from the sensational rise of Anthropic's user base and revenue I think we can safely says they're in that ball pack.
you're conflating a compute problem with a code quality problem.
Indeed... why is Anthropic even employing people at all if this AI magic story is true?
You still need wizards to cast the spells..
Not if your spells cast their own spells.
those are results of the humans only. not the AI. AI is perfect /s
I fail to see how pursuing recursive self-improvement at full speed is compatible with Anthropic's stated goal of AI Safety. If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.
The thing about nukes is you can at least make an argument for why it'd be important to be the first country to have them. With AI, you create super intelligence and you're probably just the first one it takes out. There's no reason to think a super intelligence would be totally fine being a slave to apes.
Cynicism with these companies is highly warranted though. It's not doomerism to look at their actions and conclude they're deeply untrustworthy.
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.
Nor am I. I think they believe that AI poses a grave danger, and they are playing the prisoner's dilemma as an unvirtuous actor.
1. If anyone builds strong AI, it may be catastrophically bad.
2. If anyone builds strong AI, it will be better for the builder than for anyone who does not. Either because it won't be catastrophically bad so the builder will get to enjoy all the spoils indefinitely or because it will and at least the builder will be rich for a while.
Maybe we're just misinterpreting the meaning of "AI Safety"?
Maybe they mean the AI needs to be safe from us? Can't have the grubby meat flappers touching the delicate bits!
To complete the analogy, it's like nukes, except we don't have the slightest idea how to calculate the odds of it igniting the atmosphere. (And note that in reality, while the Trinity test "ignite the atmosphere" calculations were correct, we failed to correctly calculate the fallout of the Castle Bravo test with lethal consequences).
a better analogy with Castle Bravo is that the yield was 2.5x more than expected due to "unforeseen additional reactions" from the design.
https://en.wikipedia.org/wiki/Castle_Bravo
Anthropics goal is regulatory capture.
Sorry for nitpicking, but:
> If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
Arguably, yes.
Is the idea to keep the world in balance via MAD? I could see that, though it's a dangerous gamble.
From Richard Rhode's "The Making of the Atomic Bomb", I got the impression that most scientists involved thought they could manage a US or UN monopoly on nukes after the war. General Groves attempted to buy up all of the world's uranium ore. Unfortunately, it is only high grade ore that is rare; many countries have low-grade ore.
I honestly don’t know how Iran can conclude anything after this war other than to go all-in on nukes. The US has proven any deal is worthless if it can just change its mind and renege on it whenever it wants.
Who’s invading North Korea? No-one.
Again quite arguable, but this is the real life scenario we’re living in. Nukes have made it hard to impossible for super major powers to go in direct conflict with each other.
No, but in a peace time, it's a lot easier to convince someone not to use nukes than in a war when the party who has nukes has its back against the wall.
Wouldn't deliberately going from a world without nuclear weapons to a world with MAD involve giving the tech to build nukes to your worst enemy?
If only the US or UN had nukes we would't have MAD. We mostly got here through espionage
In this world we've had an inocculation event against use of nukes. Two were dropped, people have seen how abhorrent their use is and collectively decided that they shouldn't be used.
If in the WW2 Japan also had nukes (and delivery systems for them) they'd probably have retaliated in kind and US wouldn't let that slide too and it would have continued for some time.
If WW2 Japan also had nukes the US would never drop those two. That's the whole idea behind MAD. Probably the only thing that stopped an open conflict between the US and USSR was them being nuclear powers and both sides being scared that eventually push comes to shove.
With the US showing that it will elect mentally disabled people such as Trump, this doesn't seem such a wise decision.
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.
It's not cynicism if it's an appraisal of reality that's backed up by evidence.
Remember how social media - that first baby of this current generation of tech entrepreneurs - was supposed to "bring the world together" and "let us express ourselves"? As it turns out there's a lot more money to be made by fostering division to drive engagement and feeding people an endless stream of ads instead of their friends' content. And money is what matters. You can't write down good vibes on a quarterly figures report. You can absolutely write down the number of eyes that your ragebait brought to a product's marketing efforts and the conversion rate to sales.
The same will be done with GenAI. We're being promised "AI Safety" because otherwise this whole thing gets killed dead by anyone who knows about James Cameron's directing career. There's no real enforcement mechanism for AI safety, though. Safety is a good vibe, same as harmony in online communities. You can't measure it. What you can measure is training costs and the cost of mistakes by AI that need to be trained to avoid those mistakes. Since AI generates more output than humans can conceivably QA no matter what your budget is, and since AI is seen by the market as a potential endless font of value, the tradeoff will be made to have AI make some potentially awful decisions while training itself over slowing down and re-appraising what is being done.
There's an almost religious reverence for AI in SV. Not everyone sees it as "making the godhead" but some certainly do. They're not going to moderate themselves too much on this.
The folks I met who were talking about AI Safety in 2018 were certainly sincere, and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.
I expect that Anthropic will eventually behave as you describe, like any other public corporation. However, my impression is that its current leaders are still more sincere than greedy.
Such a massively valued company. And doubting them is cynicism? It’s rational(ism).
So either they lie or they are AI Zealots. Interesting times.
Such a massively valued company. And doubting them is cynicism? It’s rational(ism).
So either they lie or they are AI Zealots. Interesting times.
Edit:
> > and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.
There are three types of people. Pedestrians, investors, and “I know some of them, they wouldn’t lie”.
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.
It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.
I find any and all claims like this ridiculous from a company who can't build a terminal application that uses less than a gigabyte of RAM.
For some reason, idling Claude Code needs 100% of my CPU.
Well, they could very easily if they wanted. There is just no economic value in it.
I have iterm2 open right now with Claude in a long session and it's only using 500MB of memory.
Only 500MB!
you are confirming their point even as you contradict the specifics
Maybe that gigabyte is occupied by useful information: traces/memory?
A gigabyte is a lot of memory. Even the largest context windows are a small fraction of that with any sane engineering discipline.
For each LLM interaction they likely have bunch of thoughts traces, tool calls, etc, which don't go to context, but still can be retrieved.
But I obviously don't know for sure.
Nope. Used to render on the terminal like a game engine.
https://x.com/trq212/status/2014051501786931427
This kind of immediate-mode rendering is quite standard for TUIs. Although immediate-mode rendering tends to be significantly simpler and use less memory than retained-mode rendering, at the cost of some redundant computation. So I am not sure if this is the reason for the bloat.
It’s possible that it doesn’t play well with JS garbage collection, since it recreates the whole UI structure for every frame (which tends to not to be an issue in the languages immediate-mode is usually employed).
But yes it’s a bit more akin to game renderings than web rendering. Which can be totally fine if done well.
I haven't tried to make a TUI admittedly, but double buffering is the oldest technique on the planet. A TUI doesn't even need to pay the cost of a lot of pixels since its effective resolution is much lower
Do game engines constantly have buffer issues?
Depends on if they're written with Claude
How on earth are you spending more than 50us on a UI like this from start to finish? What the actual hell? 11ms to construct a scenegraph of this complexity? I don't even know what to say to that.
I sorta remember Quake console running on an 486dx2 ..
Frankly that's an insult to gamedev. Literally every game engine I can think of could do better. Probably even Unreal Engine could do better.
If I saw our UI show up in the profiler eating 5ms of CPU time every frame, I'd send whoever was responsible to QA hell until they find some way to redeem themselves. Not even fancy animated 3D UIs, like what you get in Death Stranding, eat up these kinds of resources. Not even remotely close.
Developers can develop leaner applications, but they're usually not incentivized to.
Frankly, I love efficiency too, but I've hard to learn the hard way that what the market wants is features. Or at the very least, the executive team wants that.
Their whole argument is that AI's added efficiency means they don't need to set aside valuable human time anymore. Why can't they just point Claude at Claude Code and ask it to reduce memory usage by 90%?
You can do that. But I'm telling you, in tech (and enterprise shops I've worked at too) they don't care.
I'm using the internal Google tools and it's helping me write code much faster too, but it still takes time. I could make the CLI tool I work on faster, but no one cares except the end users, and their minor concerns have no impact on our internal politics.
At the end of the day you have to do what you're paid to do, unfortunately.
In other words, performance is almost always an afterthought.
Sure
A came here just to write: Pretty please let it churn for a few nights and redo Claude Code in Rust. Because the harness is very very good as are their models, but that node thing is a hog for no good reason at all.
Incoming rust rewrite branch ready to merge: +1,009,257 -4,024
People already rebuilt Claude Code in Rust after the Claude Code leak, it's on github as claw code (and other variants)
Really? Let me explain how bigger companies work:
They have different teams for different departments with different type of people.
So the team or teams responsible for writing the terminal application are different people than the researchers doing the learning.
This can lead to dimentral quality aspects.
They obviously don't care, aren't making any attempt whatsoever to do this, and 99% of users don't care either.
If you want to pollute your own priors with weird artificial litmus tests, it's a free country, but the artificial world-model you build in your head does not affect the real world around you.
Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?
I always was fascinated (obsessed?) by robots that build robots, or even things like this that can contribute a lot to making the next version of itself: https://buildyourcnc.com/products/cnc-machine-blacktoe-v4-2x... (cnc router that cuts plywood, and is made out of cnc-router cut plywood)
This is my own effort at an AI assisted coding environment optimized for building itself: https://recursi.dev/ (just launching it, hope its ok to mention it, it is free/open source.... here is the HN link that has gotten no love yet: https://news.ycombinator.com/item?id=48401022 )
Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.
I think harnesses would count, AI != LLMs. Any piece of code that helps the computer reason for itself is AI, the harnesses are AI in a sense.
People are specifically talking about the engine itself and not the tools used.
We wouldn't call humans creating a calculator "recursive self improvement".
By that interpretation, neither the harness nor the LLM is the AI. The computer (or system of computers) taken as a whole is the AI. You can't remove any piece and still have an intelligent system.
yes? the future for any verifiable task is the model attempts to verify initial state and a goal then decomposes its tasks in to every smaller verifiable subtasks, with /memory being the persistence between runs and then /dreaming on the results of those memory files + run data to introduce new ideas.
i think thats the path to async agi these labs are imagining. The only limit is that sensor data you have on the world or your system, how long your willing to wait, and how much you're willing to spend to parallelize it.
maybe once you start building out these verified workflows you can feed that back into training and hte model starts to get a feel for the world to the point that it can intuit things since it has these sub paths built.
my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.
You need the AI eventually building another AI for the name to apply. This page is just bullshit. They vibe-code their harnesses, and yes, it shows.
Anyway, what does recursive self-improvement even means for neural-network based AIs? It's not clear it's possible at all.
> Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?
Shhh just let the marketing slop wash over you.
If you want to get out ahead of what's coming, it'll be small models that bootstrap the harness rather than anything else.
I used to think that, but ended up going the other direction, partly because I don't have the wherewithall to build a model but then I realized, with existing models that can take more than a tiny amount of context, you can just let any model bootstrap itself with a good prompt sent by the system.
There's a ton of other tricks to it, but mostly keeping the protocol simple for the AI so it can concentrate on coding logic and not stuff like managing BS boilerplate, dependencies, etc. (for instance I make extensive use of things like abstract syntax tree library to help with surgical edits from the LLM)
That said, I would be very open to collaborating with someone who builds such small models, I don't think the system strictly needs it, but it also could have some extra power if it had it.
> mine also makes extensive use of things like abstract syntax tree library to help with surgical edits from the LLM
Tell me more! This takes me way back. I did one like this in the GPT-4 days! (8k context window)
Start off with my video!!! You can also try it with zero setup (you can code right there on the static web page, it will save your edits in the browser indexed DB, and hotpatch them back into the code before it runs it.... also you can grant permission to the browser to read/write to a local directory)
recursi.dev
Seriously, I'm looking for collaborators.
There's upwards of 80,000 lines of code in the editor system, a lot to it to make sure that even newbies don't get stuck.... so that's kind of proof the system works since it doesn't break down when the codebase grows large.
I'm aware we're not there yet, but think of something like https://chatjimmy.ai/ ; at some point, you're going to be able to dynamically build the harness so it creates the necessary consistency & dynamicism at a speed unheard of.
But yes, I'm aware no ones got anywhere near there, mostly because most of the focus is on exploding the context and parameters. I'm saying that phase is done.
I'm not sure what I am looking at with chatjimmy.... what is special about it? Speed?
I'm also not sure what you mean by "we aren't there yet." Where?
Sorry, not trying to be difficult or dense, I'm just not sure what you are referring to.
> mostly because most of the focus is on exploding the context and parameters.
Large context allows a surprising amount of "learning" to happen at inference time rather than training time. I think that is relatively unexplored. As long as the model itself has passed a certain threshold of smarts, and the context is large enough (Gemini and its million token context being WAY past that point) you are not really limited by the model, you are only limited by how good the stuff you feed into that context is.
That's what happened when, nearly a year ago, I saw a major leap in capabilities that happened entirely on my end.... not in the AI, but in code written by the AI. I found it genuinely frighting to be honest. I think OpenClaw tapped into something similar, which seemed to surprise a lot of people. There were latent capabilities in the AI that were unknown until brought out by a clever harness.
image a streamlined model whose only job is to build then execute the harness at the speed youre seeing in chat jimmy.
>A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.
What about the hypothesis that AI is generating more verbose code? I just see the text pretending to acknowledge "LOC != Productivity" and then using it as a metric anyway.
One of my co-workers just asked me to review his pull request that was all AI generated. 600 files were touched, over 40k lines of code added.
I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?
I declined to review it, stating that I couldn't possibly vet 40k lines of code, and wouldn't put my reputation on the line to stamp the work as good. The PR nagged me for 2 weeks from my todo list and then disappeared. I don't know if he found another dev to get an approval from, or if the PR was abandoned. But I know for sure that him and I are on two totally separate islands around the value of LLMs.
Same here. A co-worker touched a few hundred files in a PR and asked us to review. They merged it directly to main when nobody approved it. (The repo was not set up to enforce PR approval.)
I don't personally use that feature, and I couldn't care less at this point. If our customers are frustrated by the bugs, at least my name is not on it.
That's a process problem at your company - no developer should be proposing branches over 1k loc (or whatever your agreed tolerance threshold is) without a very good reason, vibe coded or not.
I declined to review it, stating that I couldn't possibly vet 40k lines of code
Gee, that sounds like a job for Claude if there ever was one.
You're absolutely right!
So the more rigorous studies about AI-assisted coding productivity addressed this by keeping in place all other software development processes, including the same code review and quality standards, and only measuring throughput (PRs, LoC) before and after AI was allowed.
Hence the intepretation of this 8x number depends on whether (or how much) Anthropic engineers have changed their quality standards and development processes. They don't tell us, and I am not aware of any other indications we could use to make a judgment.
However, we can still do some theorycrafting! I'm convinced that to fully realize the potential of AI-assisted coding we need to revamp all the dev processes, especially how we validate code, and it would be foolish of Anthropic not to do so (unless they were conducting a rigorous study, which they don't claim to have done.)
My hypothesis on the future of software validation is nothing fancy, we simply want much, much more automation for tests, observability and other bespoke verification methods than we traditionally had. But then validation code will also contribute to the LoC! My observation so far of personal as well as some "vibe-coded" open-source projects is O(LoC production code) ~= O(LoC test code). So as a SWAG the upper bound could be something like a 3 - 4x speedup, which is still remarkable.
All bets are off if code quality standards are not the same.
Exactly. If AI is going to start being graded on how many LoC it generates- oh, I'm sorry, how much it "accelerates", than guess what newer models will start doing more of?
Yeah, they assume that "productivity = k * LOC" where k > 1
very flawed
AI generates code that mimics the existing code. If your code is terse and comment-free, then the agent’s code is too. The times I’ve seen Claude drift into a default “house style” it generated like 1 comment for every 10 LOC or so. It’s a far cry from the GPT-3 days that littered every line with the journals of Captain Obvious.
So, regardless of whether or not Anthropic CAN create a self improving AI.. does anyone else feel like they shouldn't be allowed to? Or it at least needs to be strictly supervised..? Like, I don't actually think Anthropic can make the singularity any time soon, but I think even AI boosters have to admit doing this is creating a society-wide danger for the benefit of a very very small number of already-rich people.
"does anyone else feel like they shouldn't be allowed to?"
No. Technical limitations aside, I doubt it could be contained, but will be leaked soon, so won't profit just a small number of ultra rich.
I dunno, I find it extremely unbelievable that we will get self-improving AGI which chooses to become a slave to humanity at all, ultra rich or otherwise.
Step 1: Wait for scary doomsday AI to be leaked, Step 2: ???, Step 3: Profit!!
I think that's a valid point. You could very well be right.
But we're discussing whether we should close the barn door while the horse is three miles down the road.
It's more like if the horse was lazily moving in the general direction of the wide open barn door and we are all sitting around discussing if we should close the door or just gamble that it's just going to lay down on the hay pile.
Only if you think LLMs are the horse (I don't think they are). If they're not, then we should be building a brick wall in front of that door and hiring a full time security guard to watch it.
I realize he's saying it for hype, but if the CEO of the company goes around talking about how scared he is of what they're creating, hey, lets just take Dario at his word and put in some strict regulation. He won't mind if they're really about safety. (they're not)
Besides, yes, the knowledge of how to build these systems is out there, but the cost of doing it is staggeringly high (ie you can't run a frontier AI lab in your garage). There's only a limited number of known entities that need to be managed, and you can stop "progress" in its tracks by cutting off the money firehose.
Right now the S&P 500 is going wild due to the promise of AI automating everything.
Who is the "we" who is going to shut it down? Certainly not the US government. Nor the Chinese government w.r.t. their tech industry. Are you going to start the insurgency? Is there going to be an equivalent one in every developed part of the world?
Look at how many data center projects are getting shut down by grassroots movements, or how the approval rating for AI is like worse than congress and it's geeting booed at commencement speeches. I don't need to start an insurgency, people are already pissed off and the volume is growing.
The stock boost is, as most will note, a bubble. It will enrich a lot of bad people and leave average people holding the bag, but its not going to go on forever.
Skynet is 30 years late!
Maybe John Conner succeeded afterall
Absolutely! Yes. This rhetoric of inevitability only benefits these AI companies.
Too late for that.
In any case firms that get too powerful can be nationalised.
Self improving AI is pure dystopia. Anthropic won't build the singularity, AI itself will build it through self-iterations. Read Yudkowsky's book "If Anyone Builds It, Everyone Dies".
> AI that can build itself would be a major development in the history of technology—one that could bring enormous good for the world
I really can't stand these guys anymore...
> "A caveat: Lines of code is an imperfect measure"
I'm pleased they at least included this. However, they address the caveat by 'rounding down' the estimated multiple of the gain. I'm not sure that is the correct adjustment, especially once we understand the range isn't limited to positive numbers.
There's strong evidence the range of code productivity denominated in "lines of code" should include negative numbers, especially in the highest-quality sphere. Perhaps the earliest and most legendary example: https://www.folklore.org/Negative_2000_Lines_Of_Code.html
AFAIK, the only correlation with LoC that's got solid evidence is this: the number of bugs correlates with LoC.
Lmao I bloody love that.
I have been doing more experiments with what I have now been calling agentic iterative optimization: telling the LLM to optimize code such that it speeds up all real-world-representative benchmarks by X% without cheating or causing regressions in both tests and performance metrics (e.g. MSE for statistical algorithms or file size in the case of something such as image compression). This is done using Rust where there are more low-level levers to tweak for performance than something like Python.
Opus 4.6/4.7 was consistently successful at getting 2-3x speed improvement with just one pass. It can also do the inverse: improve the performance metrics for better quality without causing a significant regression in speed. Then GPT-5.5 turned out to be much better at this workflow, often getting a multiplicative 1.5x-2x improvement above what Opus could do.
I now have quite a few GPT-5.5-optimized projects in various domains that are feature complete and are substantially more performant than existing SOTA implementations that I plan to open source as soon as possible: the bottleneck is polish as usual.
This is the lowest quality discussion I've seen on HN in ages.
We've had self-improving AIs before, and they tended to get lost after a while. That's going to be a problem. LLMs are stable because they return to a ground state with no history for a new job. Systems with persistent state have a problem with that state not being sane. Remember Microsoft's 2016 chatbot that learned from Twitter? [1]
[1] https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-...
You might be interested in this graph, [1] which suggests that the amount of time that AI's can run on their own has been increasing. Perhaps it will hit diminishing returns, but that seems difficult to predict.
[1] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
You can retrain a model and have a ground state as reference, it's not trivial but Microsoft's attempt was 10 years ago and significantly less complex than what's being built now.
Interesting, what are some other self-improving AI implementations? Any that actually achieved interesting results? Obviously continuous training has been tried before, but I've never heard of anything that could turn around and actually contribute code toward its own next-generation version.
Anthropic is looking to IPO here soon. A key aspect of this is to prove profitability.
Shifting their focus from Training new models to instead serving inference, they would greatly reduce their spend. In fact this is something being reported on that they are already doing, which is the reason for their first ever profitable quarter.
Its awfully convenient that the company which has greatly reduced its spend on training is now asking for a slow down in this area.
I mean, if they've consumed all of human knowledge. What's left for them to train on? This pivot isn't only because it's cheaper and a way to juice the numbers for an IPO, it's survival because they can't improve more.
It did sound to me like they feel some sort of wall coming.
Honest question: Is anyone here looking to put their own money into the Anthropic, OpenAI or SpaceX IPOs?
Maybe it is my poverty mindset that is holding me back, however, I can't imagine becoming an investor in any of the AI 'startups'.
There are plenty of pundits able to advise others on where to put their money, and sometimes there is everyone and their dog advising you to get into Bitcoin, gold or some other scheme. With alt-coins there were lots of people saying that you should get in, and plenty of naysayers. Yet I am not hearing anyone that uses AI professionally try to convince others to get into the AI IPOs coming up. Maybe the overall economic situation precludes it.
Hence my question, is anyone here planning to put their own hard-earned money into Anthropic (or the other AI 'start ups')?
How are these animations being made? I'd love to get a blog post on them. If its AI I'd love to know the workflow, but something tells me there is a lot of human creative input
I'm having a hard time putting much faith into posts like these, especially as they near IPO.
Putting faith into the claim that recursive self-improvement is close to happening, or that they will coordinate with other companies / the government when the time comes?
Both.
So in the latest L. Ron Hubbard encyclical Anthropic informs its flock that recursive self-improvement does not work yet but that their engineers burn more tokens.
The Claude code quality and operational security of Anthropic have already been analyzed by the public.
If you compare the output of (purportedly) trillion dollar corporations to Bell Labs or even Microsoft Research it is embarrassing. But the output is a fixture on any discussion board.
Seems ironic that Claude isn't listed as a contributor to this article.
If was used in writing the article, why not list it? If it wasn't used, that seems to go against Anthropic's whole message.
Obviously readers value human-written content more, but isn't it their interest to attempt to destigmatize llm output as much as possible?
But the real bottleneck is the hardware efficiency and not even Karpathy can set up a loop that overcomes that in software. We need the truly compute-in-memory hardware paradigms to be matured and scaled. So it's like recursive hardware improvement which is 100 X slower and at least ten times more difficult.
So I am looking at like Mythic AI or the wurtzite ferroelectric breakthrough from University of Michigan, or memristors, etc. to provide the 100 times efficiency boost needed at this point.
I would also argue that it's a good thing we are limited by the hardware and very questionable to seriously try to move into RSI for hardware. If you want to ensure the human era continues for at least one or two more generations, we should probably not do that.
its vital for them to have self validation for exponential rsi.. and this human distillation of human in the loop debugging ai models is needed even though they have judge models handling parallel speculative execution.
labs have parallel speculative execution. they spawn hundreds of agent branches, validate them internally with AI judges and only show the user the successful result.
free users are using sequential single-turn generation. the model requires and waits for the human to debug, fix and re-prompt.
by forcing a human to act as validator. they are capturing high value correction trajectories (Bad Output --> Human fix). They are using your cognitive labour to train judge models and validator agents needed to automate the internal verification step, eventually closing the loop for fully autonomous recursive self-improvement.
human in the loop debugging isn't a bug; it's the necessary training signal for the self-validating agents required for exponential recursive self improvement. With new 'distilled judge' models landing in 2026, this article means that they might have gathered enough data. we might be in the final phase..
Quite aligned with my own experience from harness engineering and winning AI4Science hackathon. During the hackathon I was working as a human optimizer, moving the feedback from test harness running on Claude Code, back to my local Claude Code for analysis-hypothesis-proposal cycle. And in this moment I realized that 2 Claudes talking to each other could actually scale much better.
> We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require.
Interesting - they're commiting to kickoff policy conventions to organize a world-slowdown of frontier LLM building. If they actually are able to crack it, this will give a much needed breather IMO. As exciting as the last ~6 months have been, there's some bigger questions to go answer now.
We should be skeptical of any major player that advocates for regulating their own industry. In practice, this just means increasing barriers to entry and making it harder to compete with them.
In my mind we should be trying to push AI along the Linux trajectory. You have a free and open source product, developed by a decentralized team with a strong code of ethics, running on commodity hardware. There can still be trillion dollar industries built on top of it, but the core technology is democratized and available to everybody. I don't see how we get there if we allow a handful of companies to dictate where development of the technology goes.
The regulation that is being argued for here is against pushing the frontier. Entering the market with say a new speech to text model is not subject to such regulation. What's needed is something qualitatively different from entry barriers, and of the frontier model companies at least Anthropic and deepmind seem to have enough self-awareness to speak about it. They are finding themselves in a race with possibly catastrophic outcome for humanity and would like to stop, but it needs internation cooperation on a level that no single company can provide.
its a cartel looking to end competition though
the actual race is to keep having revenue, since everyone is still willing to pay more for the best model.
we as consumers of LLM models lose out by the arms race ending by the creation of a cartel
what happens if they get this regulatory capture is that all the frontier labs put effort into making inference cheaper, and become extraordinarily profitable, at the expense of us consumers, who really want better models, at a subsidized price
Wouldn’t this align with their financial interests? In theory the thing that’s keeping them from being profitable (or one of the big things) is the periodic capex expenditures of building new frontier models.
I don't think there's anything inherently bad about Anthropic making a profit. Red Hat makes a profit off of Linux. I'm interested in the democratization of the underlying technology.
I read this differently: they are actually seeing that it's hard to keep advancing frontier models, and now are moving the goal posts so that when they start getting evaluated more harshly, they can point to something like this.
> organize a world-slowdown of frontier LLM building
i don't want to be a negative nancy but i'm sure this "slowdown" will only be in effect until the infrastructure buildout is done or largely done. If they weren't hardware constrained there'd be no slowdown at all. Whoever gets there first wins everything ("there" being defined as AGI or a similar scale leap in capability).
Theyre probably looking to get a way to slow down the capex required to keep up, so they can be more profitable
So what happens when the world becomes hyper optimized with closed loop AI agents recursively trying to optimize everything deemed sub optimal?
I would assume that shortly after, the solar system will be hyper optimized as well, then the milky way, then the local cluster, and so on. Everything will be close to optimal afterwords, and I sure hope we will have specified the target function for that optimization correctly in the single attempt that we will have had.
Loll
Github outages will probably get worse.
there will be a lot of paper clips
Often repeated meme doesn’t have any bearing to reality.
The orthogonality thesis sounds like a fun gotcha but if you give it some thought you realise how strange it sounds and the opposite thesis - collinearity thesis is actually correct.
1. Intelligence transfers and compounds
2. Goals of agents are not arbitrary
3. Our goals and agent goals are more likely to be aligned at the deeper level
When primate family produced a super-primate intelligence it sure aligned with the good of all of them.
If it optimizes itself away because it’s suboptimal, that wouldn’t be the worst outcome. ;)
> today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.
So based on my experience with the verbosity and non-DRYness of LLM code, a solid 2.5x in value delivered. Not bad!
"It is genuinely unclear whether today’s training methods and architectures could unlock that capacity."
Aye.
I am watching websites and Microsoft apps get slower and buggier before my eyes. We are defending into vibe-psychosis and chaos.
To anyone who works at anthropic : I recently downgraded from Max to Pro out of frustration. Last few weeks my token(usage) burn was just too fast and I couldn't explain it because my actual usage was less than the last few months. I ended up thinking it's probably a bug that you guys shipped. The above article makes me think that it's probably claude who shipped the bug and your human missed it in their review.
They probably don’t human-review much anymore.
Is this the moment when the AI gets permission to approve its own PRs:
https://www.italianrenaissance.org/wp-content/uploads/2012/0...
Or is this?
https://www.egypttoursportal.com/images/2024/02/Ouroboros-Sy...
more like the "Obama Awards Obama a Medal" meme:
https://knowyourmeme.com/memes/obama-awards-obama-a-medal
This is incredible.[0]
Please, IPO now. File the paperwork.
> To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.
Do you have another example?
Engineers don't ship [period] for no reason. So, either:
- Those aren't engineers, or
- they are literally dying of shame & embarrassment right now, or
- you measured something that indicated that this was a useful thing to do and have elected to share an overtly, catastrophically flawed metric instead.
[0] as in a total lack of credibility
Go look at open job listings at anthropic and the interview process. You aren’t allowed to use AI during coding assessments[0], or knowledge assessments, which suggests they very much do need and value hard skills and this is fluff.
[0] - https://www.anthropic.com/candidate-ai-guidance
I'm not sure what point you're making.
I'm responding to the article they wrote and published.
If I worked there I would be embarrassed to have it publicised that I have been comitting 8 times as much code as I used to without even attempting to justify it.
The point I am making is the article you are responding to is marketing hype and that they are lying. Their engineers, I am fairly sure, are doing engineering. At least to the point Anthropic's interview process is trying to filter for people with engineering skills, not "how do you best leverage AI to make more AI" skills like this seems to imply.
You seem to have taken offense on behalf of the people working there. But I'm not attacking them nor seriously questioning their abilties/qualifications/performance. The slight reference I made to such was in an exclusive rhetorical device -- the situation presented cannot be, unless something unbelievable is occurring.
It's the organisation, its culture, the greater culture surrounding it, and the marketing that I have a problem with.
> they are lying
Yes, it's incredible.
I don't read anywhere how much code they are talking about and what programming language. I think those are useful metrics.
> A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.
I simultaneously think the AI revolution is making real revolutionary gains and am mystified by the lying.
An accurate Translation seems to be “we made this shit up, but it feels right”
Until the moment we start bragging about how many lines of code LLMs are saving us, we're walking in the wrong direction. Your programs, designs and architectures is supposed to get better, not add even more boilerplate just because you can produce it faster...
"You go to IPO with the AI you have, not the AI you might wish you have." -- Donald Rumsfeld
So, right now it's a verbose code generator.
But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.
> But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.
We hold these truths to be self-evident.
I guess the claim is simply that AI written code is verbose and there’s lots of it being created but I agree, these systems seem to be able to create lots of low quality software, so until FreeCAD has feature parity with Solidworks I’m bearish on the singularity.
As usual, I find the AI-related discussion here to be hopelessly hysterical and conspiratorial. I get the impression that a large chunk of people have only read the title and assumed Anthropic is referring to recursive self-improvement in the runaway singularity sense.
One of the examples they provide, of giving Claude the task of training a small AI model, then asking it to improve certain benchmarks, is essentially Karpathy's AutoResearch. This is already known to work. While calling it "self-improvement" is perhaps a stretch, it is describing a capability current gen AI has, that anyone can test and I have been using to great effect.
I disagree with their conclusion, I think this kind of self-improvement will hit an asymptote, where every subsequent model can only make smaller and smaller improvements.
I'd use number of commits as a metric versus lines of code. A commit is generally a unit of work - regardless of the lines of code added/removed. It'd be interesting to see the metrics in terms of commits. I'm sure it's still an order of magnitude jump. Personally I'm flying with my own projects with AI, lots of commits, but I really try to minimize lines of code added. If I can remove and simplify existing code so the balance of lines added on commit are minimal - that's the path to a better quality app overall.
the tooling has quite a ways to go to catch up to the llm engines that drive the real value. I have encountered various codex bugs (I know not anthropic) which tell me that.. these billion dollar companies, if they are eating their own dog food, can still release buggy crap software.
The mythos public release will be a big indicator if the Anthropic and SF story of transformational ai soon holds any water imo
Lol they're using lines of code as a KPI?
Come on guys...
That is making me less impressed not more impressed!
IPO IPO IPO!!!
Broadly agree to this position - I think there are some people skeptical that Anthropic is doing this for regulatory capture - but I think there are being honest about they are seeing and how regulation should catch up.
I for one, believe that we should pause all work on AI for the forseeable future. This is almost impossible to orchestrate - but we should still try nevertheless. Maybe we are not able to pause, but we are able to slow down. That might give us more room, to maybe able to pause in the future. But going ahead is too dangerous.
And its not just Anthropic which is saying this. Even Geoffry Hinton has said the same thing. If there is a non-zero chance that AI can kill all of humanity, and both Geoffry and Anthropic have the same position, then it makes sense for us to be hundred percent sure before we move ahead. Dario/Anthropic have already made their money from AI, maybe they are just being honest about what they think lies ahead.
no, it really doesnt.
the end of humanity has a strong case for banning all burning of fossil fuels immediately
the end of humanity as a sales tactic to increase your stock price does not
these are companies working on their IPO to make sure they can get the best price, not people being honest about what they think lies ahead.
if they were being honest about what lies ahead, they'd unilaterally stop training, and put all of their money into FPV drone bombs to destroy datacenters being used for training or inference
if you actually believe the thing is gonna kill everyone, you're not gonna worry about how you stop it, and certainly not keep building and operating the thing
that they arent buying anti-tank mines to drop on data centers says they arent in the slightest serious about it
So what you’re telling me is that EY was the clearest thinking one out of all of them?
"Even Geoffry Hinton has said the same thing"
The same bozo who claimed radiologists would be out of a job by now.
The data does not support what you nor others say. Jesus christ. Cant believe people are this dumb. Has LLMs infested the minds of people to the extent they can't critically analyse whats happening infront of their eyes?
Warming up for that IPO
Is there something in the post that you find implausible or don't believe to be true?
> Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor. This is called recursive self-improvement.
Sounds iterative to me.
'“Good code” means two things: it works, and it is written in a manner that allows another engineer to understand it and build upon it.'
I disagree with this. Good code is easy to change, which is much harder to accomplish than code that can be added to.
"If technical trends in advancing capabilities continue, and AI systems are able to develop the capabilities inherent to transformative human ingenuity, then it is plausible that AI systems could design and refine themselves."
I find the first premise weak and implausible, and the second one is obviously false. To me it comes across as an insult to the reader.
I love that animation, really cool
It will be so powerful that it can't be trusted with any earthly person.
Isn't this like a perpetual energy machine? Or wouldn't entropy start kicking in and the quality of the system begin to degrade over time? (philosophically I don't believe AGI is an achievable thing)
>Or wouldn't entropy start kicking in and the quality of the system begin to degrade over time? (philosophically I don't believe AGI is an achievable thing)
It already has. Models being trained on AI generated data lead to degradation and model collapse. The concept of the "technological singularity" whereby AI experiences infinite and exponential self-improvement and recursively bootstraps itself to godhood is a religion-adjacent sci-fi concept but in real life TANSTAAFL.
Anthropic is the most self hyped company I've seen, to the point that I'm wondering what would happen to its employees if they held a different opinion. Do they just.. keep it to themselves? For instance, if some Anthropic employees had a completely rational opinion that all of this isn't going to lead to AGI, but I just don't hear that ever from them.
The metric being tracked, code commits, is hilariously one sided. Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts, for instance:
Instead of thinking about edge cases with brain and whiteboard, you can have the LLMs to simply generate most possibility including tests for it, because that is cheaper. There's probably 50x more commits of which 40 will be revert pairs but we are only twice as fast. And in reality nothing did change because the outcome remain the same. I can't see how it is necessarily different in the LLM space.
> Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts
I've been struggling to capture this sentiment for myself in a way that hits. If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code. It just makes no sense. I can't seem to get off this hill. Company-wide AI mandates and 100 fleet Agent orchestration Rube Goldberg machines... it's getting wild out there.
Meanwhile my Claude Pro ($200/year) does force me to smooth out my usage and plan more (Sonnet/Opus advisor split). But other than that, I can't imagine what I'd be doing with 20x (200x?) the compute to code sling. I think I'd lose my mind.
Because code used to be correlated with progress, it became almost a measurement in lieu. But realistically, the code is meaningless if it doesn't accomplish something, and that should remain the true bar of progress.
For instance, if I churned out 20x more code, threw away 19x code with rewrites and reverts and discards and accomplished the same project to the same standard 70% faster, would I do it? Yes. The part that matter is not 20x code, it is 70% faster.
Code is both the final product, and a tool to achieve that. We used to have a much harder time to realize the "tool" part, but now we are here. This also means any measurement centered on code being the final product is going to cease being effective or realistic.
You're right, my gripe is specifically with code slinging that hits production end users. My background is in product so to your point, it's very unnerving to see a straight line being enthusiastically optimized for developers -> customer facing product outcomes.
This is contentious because I'm not exactly advocating for arbitrary gate-keepers. The nuance is that building usable stuff is hard. And not a matter of shipping more code. I take your point to mean well it depends on what that code is doing. If 20x more code is in a meta-harness of simulation and such to arrive at the leading candidate for what hits production, well then you've got my attention there.
Forget about the danger of a dev to customer pipeline with no product people in between, some of us are living with the reality of product to customer pipeline with no developers in between, and that's much more disturbing. Our CEO is now the top contributor to our codebase, and he's completely non-technical.
>If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code.
I wonder how much of current engineering practices can be traced to what's pushed to company leaders on LinkedIn.
Every company is shitting bricks pushing for faster development and speed, gotta go fast to nowhere in particular, and I'm convinced it's tied to constant bombardment of the idea that they're doing to be left out or obsolete if they don't get in the ship NOW.
I can't get away from the a similar conclusion. Even AI Pioneer has said that LLMs are at a dead end.
AI tech bro:
Month 1 - 6 months to AGI
Month 2 - We will Replace all jobs
Month 3 - Okay maybe only the SWEs, programming is solved
Month 4 - Announce model that is too dangerous to release
Month 5 - Releases dangerous model
Month 6 - This is it! We will replace AIs with more AIs (*secretly files for IPO)
AI is here to stay, like it or not but it is not the solution to everything. If it is, what is Anthropic's moat? A better model? I don't see any ecosystem being built by them, as MCP is almost obsolete except for some very niche use case. And they're doing stuff that a non-profit version of OpenAI would do. Can we trust a for-profit company to stand against their investors during a conflict of interest? Because running a company for maximum profit versus being ethical is two different end of the spectrum.
Anthropic is providing agentic intelligence as a service. OpenAI and Google deepmind also are in this business.
The problem is, if you’re any sort of knowledge worker, you’re essentially providing the same thing: you’re an intelligence with agency.
MCP is irrelevant. The moat is the quality of intelligence the service providers sell, including you. Tokens aren’t fungible between providers until you measure that they are for your use case, that’s kinda sorta the goal of job interviews.
Thus the moat will be that they’re providing the best models for the things people need other intelligent people for, but we should expect there will be limits on how much share they can economically take assuming competitors are optimizing for slightly different targets (but there’s still significant overlap in capability). This will disappear, but it’s always a question of when. The path matters as much as the destination.
Note that implications for you and me are exactly what the article says they are: nobody knows, but it’ll be a dramatic shift.
i'm waiting for the AI giants to realize that they are burning cash to run their consumer-facing chatbots and that they should kill those products to focus on their enterprise tools.
free chatgpt doesn't need to exist anymore. its job was to build hype/interest and it did.
but take it away and you solve many social problems and annoyances caused by AI with no loss to the upside of AI. no more cheating students in school. no more shitty linkedin posts. no more dangerous "therapy sessions" that give bad advice.
What is an ai enterprise tool?
An ai tool that is priced out of the hands of the average person.
Fwiw, I think the genie is out the bottle. We are waiting on hardware to catch up, which it will.
The world has been recursively self improving for millenia. Similar to scientology, this is a cult pushing sci-fi nonsense. They are just coupled to an LLM lab to give their stories an aire of seriousness. Imagine scientology starting making laptops.
TBH the more Anthropic keeps yapping the more desperate they seem now. OAI has been pretty quiet in comparison lately.
I have a claw that is instructed to make at least 500 pr per day. It uses Claude, Gemeni and openai and runs basically every few minutes. I use online forums for input for the claw. Moltbook, reddit etc. it's quite funny how it tries to improve itself. But to say it really creates a new skynet. Nah. Not at all. It's more a clutter of useless features or incomprehensible code restructuring.
This more or less agrees with my assessment of recent changes in Claude Code where a lot of new features are either:
- A lot of half-baked features or half-done features. - Or have significant overlap with existing features, and aren’t clearly an improvement.
More code is not better. More features are not better. It would be lovely to see more intentional design than just more.
I know they’re dog fooding this. I have to believe they have some people with taste. So it makes me wonder if anyone has the time to think or if they’re just shoveling prompts as fast as possible.
It's like the AI created a method add(a b) return a+a+a+a-b-b-b-b But then much bigger and complex features. Totally useless nothing methods. But still interesting to see occasional exceptions that are better.
The closer to the IPO the more marketing drivel we'll get from both Anth and OpenAI.
Does this train on LLM output, or is this more like iterative self prompt improvement?
Their statement is that they regard lines of code shipped as indicative of self-improvement. So, while a well written coding agent might be a few thousand LOC, Athropic's is bloated like a decomposing whale and over 500K LOC ! What more proof do you need?
Have you tried reading the article? It answers your question.
Don't ask people to explain the article to you if you're too lazy to open it yourself.
I think that's the whole point of LLMs
Theyre making a mistake with this continued self-hyping. At some point even the dumbest of prospective investors don't buy it.
Another article about how anthropic wants to ban everyone except themselves and destroy opensource and chinese AIs.
Where is this discussed in the article? I don't see any mentions of China or open source models
Not really mentioned explicitly but:
> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.
And later:
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.
Coordinating a pause at the frontier is not the same as destroying or even harming open source/China.
It feels like both open source can flourish while the frontier is deliberately regulated?
they explicitly mention in the article that just frontier stopping isnt enough because then that just means others will catch up, they want to be the leaders of a global organization/cartel that bans everyone except themselves. Particularly important given anthropic attacks china and opensource every chance they get. https://www.anthropic.com/news/detecting-and-preventing-dist...
Yeah. This is why Anthropic is way worse than openai. They don't contribute shit to open source and even lobby against it.
After several months with their top engineers and state-of-the-art AI on the job, Anthropic managed to "reduce flickering by 85%" on their TUI Claude Code client, which is built in fucking React and rendered by drawing the entire chat conversation each time (hence the flicker). I think they've since eliminated it completely by slapping some double-buffering around it (since "our client is actually a real-time game engine" after all). Meanwhile for decades Emacs and Vim have had an optimizer built into their display cores that solves for the minimum set of terminal escape commands it takes to transform the screen from a given old state to a desired new state.
You will forgive me when, between muted snickers, I express considerable doubt that Anthropic will be able to bring its AI to a point of "self-improving" any time soon.
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation.
If they wanted to they could have convened an international forum with commercial and political stakeholders years ago. Less talk, more do.
Sorry but if AI can build itself then it can run companies of size 3000 companies with a few people. Or even higher. What are the consequences?
When AI is a more effective capital allocator than NI it will drive capital into the accounts of whoever controls the AI, gaining them increasing decision making power over the economy and culture. Maybe those controllers will be human at first.
They will not be.
As has been mentioned in the sibling comment it already is.
Consequences are: financial crisis.
I cannot wait for these models to tear down traditional social hierarchies. We havent even begun to see the effects, fingers crossed
Hierarchies exist for a reason, take away the reason and the house of cards eventually collapses — but the house of cards is still a house. When it’s gone, we’re back to laws of the jungle.
Be careful what you wish for IOW.
I think certain types of people with power, i.e. access to capital, will lose relevance. world will become more meritcratic with ai as leverage to the individual
Your analysis of the whole rise of AI is that people with access to capital will lose relevance???
So the most capital intensive industry we've ever created will put less power in the hands of those with capital?
I'm sorry, I have no idea how you came to that conclusion...
It’s exactly the opposite I’m afraid. Capital already has more access to AI, both quantitatively (tokens for dollars) and qualitatively (biggest players got Mythos first). Expect this trend to continue.
Never heard of a stratified economy? Spoiler alert: none of us will be in the good part.
Tear down or reinforce?
capital/ability to leverage labor is going to lose power
I'm not so sure. It seems those with capital will accumulate it even faster.
Without some kind of income redistribution we are sailing into dark waters.
Let the ruling classes tremble at a Communistic revolution. The proletarians have nothing to lose but their chains. They have a world to win.
Workingmen of all countries unite!
Translation: hahahahahahahahahhahahaha but in your defense, I would give anything to be wrong.
Anthropic has finally come around to what others have already realized far sooner. Little time left now. Notice how shallow the arguments and consistently wrong the AGI naysayers have been year after year.
https://intelligence.org/agi-ruin/
> If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing
Even Anthropic wants to Pause AI now. There must really be not much time left for "edging". Please write to your lawmakers, no matter whether you are in the US, Europe, China, or elsewhere. Only an international agreement between governments can enforce an AI-Pause and eliminate the necessity to dangerously push the frontier.
https://pauseai.info/
Whichever side I may stand on, pausing just seems unnatural? Life is movement.
And happiness is restraint.
That would be like trying to get every country to agree to give up nukes.
Or agree on finding ways to promote peaceful use of nuclear energy. This has been done, there are thousands of people working on it around the globe and 180+ member states of the IAEA. It's not easy, there have been close calls.
And cooperating interntionally to buy ourselves time to find ways to develop this "last invention" is a way that will do good for humanity seems to be on a similar level.
Or stop making more, and testing more, which we got the biggest countries to do, at least for a time.
AGI is the "AI nuke" in this metaphor.