I built one of the connected tools included in this launch (the Biomni HPC [1]), and I have spent an inordinate amount of my life working on this type of problem. (I also worked at Anthropic, but not on this product.)
As other comments have pointed out, this is mostly "data science" – but it's not just making plots and writing papers [2]. It also has integrations with many databases and computational tools, including a researcher's institutional cluster.
That alone is valuable. I founded a startup (Toolchest, YC W22) after struggling with this problem at a bio startup; integrating these tools and databases is hard and time consuming. If the only impact of this product is that great APIs are built for LLMs, it will be a massive positive impact. Many databases used in computational genomics are still only accessible through FTP!
LLMs are particularly good at navigating these tools and databases. It's often very specialized, but simple, work that benefits from in-context skills. Seeing an early glimpse of my former customers – bioinformaticians – using LLMs to solve this problem is what led me to join Anthropic in 2024.
Also, this pattern isn't fundamentally constrained to data science: you can also integrate with a wet lab or a CRO for some kinds of science. This is what I'm spending my time on now.
This type of science doesn't solve everything, but it's useful in some niches. For example, progress on many rare diseases are bottlenecked by researcher attention rather than a fundamental breakthrough.
When I saw "Science" I didn't think they meant Data Science, which is what the UIs full of pandas code and plots imply. Even if the focus is on the sciences, I suspect that's the less valuable part of the announcement particularly with the implication of Jupyter Notebook 2.0.
Image-understanding for data viz is a use case that has been ignored, and modern LLMs are getting better at proper EDA. But, uh, I may need to update my resume.
A lot of the soft and hard sciences use hacky matplotlib code to produce results and visualisation, without being necessarily data science
From the bits I've seen, I'd take claude-generated code any time over that written by maths, physics, biology, linguistics people. Even though I've seen Claude make some super-big mistakes while doing data analysis I'd guess it's already more reliable than most academics trying to code.
My take based on the video is that they're thinking more about bioinformatics, which might technically fall under the "data science" umbrella depending how you define your terms, but which is not described that way in common usage.
It's the content that determines the sort of science, not the toolchain.
So it's like Claude Cowork for Science, i.e. for less tech-savvy users? I would imagine scientists with some coding background might just prefer to use Claude Code normally and integrate it with their stack of choice, but perhaps the comfort and ease of use of Claude Science still wins out.
Mostly targeted at life sciences - e.g. integration for FDA, PubMed, genomics databases but no ACM / IEEE as far as I can tell.
Edit: arXiv search seems to be supported - but not Google Scholar etc. So, this tool is of little use for most researchers outside life sciences.
Edit 2: Quick walkthrough: the AppImage starts a browser window with an onboarding wizard and a chat interface. It suggests a few things one might do at the start of a research project - e.g. do a quick literature review. When I chose that option, wrote Python scripts that used MCP calls to do arXiv searches. Stayed seemingly stuck there for a few minutes not returning anything. Then:
> The free-text search returned too much noise
Claude decided to choose a certain paper as a starting point for further research. Shortly afterwards:
> That DOI resolved to the wrong paper. Let me find the correct anchor papers by title/author search directly.
Then it meandered a few more minutes doing research and creating a citation graph (that it did not show to me).
> I have a complete picture. Let me verify the key DOIs resolve and then write the review.
Then:
> The lint flags em-dash overuse. Let me reduce them, then save.
Then: a nice but verbose literature overview of my chosen topic
<blink>BUT it includes at least one hallucinated reference!</blink>
P.S.: What does this mean?
[reviewer] verifier_mode=default-on downgraded to off: pro subscription tier, autoReviewer withheld (frame=f2a81cb2)
impressive to me, but sadly i feel a little misleading since this is only the data-science part of life sciences.
every few weeks though i test claude and chatgpt on their scientific reasoning and it has definitely improved over time. in my experience without specific instruction on what is known/unknown they typically are lagging behind the leading edge of the field (dev bio/pluripotency in my case). probably because scientific research articles are not open-source so they can't crawl them.
claude has definitely outperformed chatgpt in this regard however, it's scientific reasoning is impressive.
The fact that we are coming up on a month of Fable being unavailable with essentially zero actual signal from Anthropic around when it may be back is crazy to me. Yet still we have these random new products coming out?
> Anthropic @AnthropicAI Jun 27, 2026 · 12:29 AM UTC
> Since June 12, we’ve been working closely with the US government to restore access to Claude Mythos 5 and Fable 5. Today, the government notified us that Mythos 5, our strongest cybersecurity model, can be redeployed to a set of US organizations that operate and defend critical infrastructure.
> We’re restoring access for these organizations quickly, and we’re continuing to work with the government to expand access to Mythos 5 and make Fable 5 available for general use again.
I mean the company has like 3k employees or more right? Lots of them are just working on more applied AI use cases that don't require frontier AI just the right integrations and structure etc.
Opus 4.8/ GPT 5.6 level models with the right workflows/ data/ access are still good enough to do huge amounts of economically valueable work.
Given the amount of AI slop that’s found its way into legal filings and other documents there’s little reason to think this won’t also just create a mess of scientific papers.
Furthermore, science isn’t suffering from a lack of papers. It’s suffering from a lack of good papers. Making it easier to just pump out paper-mill publications is about the last thing science needs right now.
They are going to make it a thousands times worse.
It wasn't perfect before, but it at least took some time to fake a paper. The problem is now people can produce a very plausible looking completely fake paper in minutes. Peer review is in the process of completely collapsing, in fact I think it's already basically done.
The only way this might fix things is if we require all papers are completely reproducable (that doesn't help in subjects like biology of course. They can still provide all the experimental data in the rawest format possible which doesn't break any laws).
it could also be said that scientific interpretation is suffering from a framework crisis. the scientific convention of experiment, is the test of an hypothesis, as a logical construct.
repetition of materials and methods toward reproducibility, holds far less wieght than multiple variants of process designed to test a common hypothesis resulting in agreement.[null, or failure to null]
Por que no los dos? Scientific review times are up, it’s harder to find reviewers, and many reviews are AI generated anyway. Auto-generated research publications will arguably make the replication crisis worse, because there will be more slop to clog up the review system, and these papers will presumably be just as (if not more) not reproducible than human written science
In some fields like comp sci, when code isn't given but the paper describes the approach, LLMs do help with the reproducibility crisis: you can ask it to reproduce the result through reimplementation by reading the paper.
If it fails you may have to double check it did properly reimplement it, but if it succeeds you do get a reproduction.
Why have they talked about this for a long time? They predicted date of code maxing out, and did so not from fitting a sigmoid or something but they predicted it would max out right during a steep part of the slope?
AI brand identity has made the unfortunate pivot to "how much do you trust us" which is going be a real race to the bottom. I don't want LLMs managing nuclear reactors or replacing junior lab technicians. I don't trust any of these LLMs to do the bare minimum, regardless of how good it is for your brand.
It's gross watching these stunts unfold. Next ChatGPT will fly a passenger jet, which Claude will one-up with an agentic surgery, which OpenAI will respond to by putting a humanoid robot on the moon. If this is what 21st century market competition looks like, we are all fucked.
I built one of the connected tools included in this launch (the Biomni HPC [1]), and I have spent an inordinate amount of my life working on this type of problem. (I also worked at Anthropic, but not on this product.)
As other comments have pointed out, this is mostly "data science" – but it's not just making plots and writing papers [2]. It also has integrations with many databases and computational tools, including a researcher's institutional cluster.
That alone is valuable. I founded a startup (Toolchest, YC W22) after struggling with this problem at a bio startup; integrating these tools and databases is hard and time consuming. If the only impact of this product is that great APIs are built for LLMs, it will be a massive positive impact. Many databases used in computational genomics are still only accessible through FTP!
LLMs are particularly good at navigating these tools and databases. It's often very specialized, but simple, work that benefits from in-context skills. Seeing an early glimpse of my former customers – bioinformaticians – using LLMs to solve this problem is what led me to join Anthropic in 2024.
Also, this pattern isn't fundamentally constrained to data science: you can also integrate with a wet lab or a CRO for some kinds of science. This is what I'm spending my time on now.
This type of science doesn't solve everything, but it's useful in some niches. For example, progress on many rare diseases are bottlenecked by researcher attention rather than a fundamental breakthrough.
[1] https://x.com/phylo_bio/article/2029233694775624096
[2] In comparison, OpenAI's science product – Prism – was effectively a LaTex editor they acquired from Crixet.
When I saw "Science" I didn't think they meant Data Science, which is what the UIs full of pandas code and plots imply. Even if the focus is on the sciences, I suspect that's the less valuable part of the announcement particularly with the implication of Jupyter Notebook 2.0.
Image-understanding for data viz is a use case that has been ignored, and modern LLMs are getting better at proper EDA. But, uh, I may need to update my resume.
A lot of the soft and hard sciences use hacky matplotlib code to produce results and visualisation, without being necessarily data science
From the bits I've seen, I'd take claude-generated code any time over that written by maths, physics, biology, linguistics people. Even though I've seen Claude make some super-big mistakes while doing data analysis I'd guess it's already more reliable than most academics trying to code.
This 100000x over. Nothing is worse than trying to productionize code coming from academics like this.
My take based on the video is that they're thinking more about bioinformatics, which might technically fall under the "data science" umbrella depending how you define your terms, but which is not described that way in common usage.
It's the content that determines the sort of science, not the toolchain.
This seems to have unblocked Claude Desktop for Linux ( https://code.claude.com/docs/en/desktop-linux )
So it's like Claude Cowork for Science, i.e. for less tech-savvy users? I would imagine scientists with some coding background might just prefer to use Claude Code normally and integrate it with their stack of choice, but perhaps the comfort and ease of use of Claude Science still wins out.
tl;dr: Use this if you don't like doing science or doing things well. It hallucinates references.
Seems to be based on https://github.com/swaruplab/operon as evidenced by the authorization dialog and https://x.com/testingcatalog/status/2037684573161783373 .
Mostly targeted at life sciences - e.g. integration for FDA, PubMed, genomics databases but no ACM / IEEE as far as I can tell.
Edit: arXiv search seems to be supported - but not Google Scholar etc. So, this tool is of little use for most researchers outside life sciences.
Edit 2: Quick walkthrough: the AppImage starts a browser window with an onboarding wizard and a chat interface. It suggests a few things one might do at the start of a research project - e.g. do a quick literature review. When I chose that option, wrote Python scripts that used MCP calls to do arXiv searches. Stayed seemingly stuck there for a few minutes not returning anything. Then:
> The free-text search returned too much noise
Claude decided to choose a certain paper as a starting point for further research. Shortly afterwards:
> That DOI resolved to the wrong paper. Let me find the correct anchor papers by title/author search directly.
Then it meandered a few more minutes doing research and creating a citation graph (that it did not show to me).
> I have a complete picture. Let me verify the key DOIs resolve and then write the review.
Then:
> The lint flags em-dash overuse. Let me reduce them, then save.
Then: a nice but verbose literature overview of my chosen topic
<blink>BUT it includes at least one hallucinated reference!</blink>
P.S.: What does this mean?
An explicit text desloppification pass (i.e. LLM-use obfuscation) seems like outright scientific fraud.
Biosciences mostly don't use arXiv, they have their own https://www.biorxiv.org/ but it's usage is not as common as arXiv is in e.g. physics.
> every step from data wrangling to publication
Do they have no shame?
It has Sonnet 5 as a usable model. Interesting.
Looks like they've just announced it - https://www.anthropic.com/news/claude-sonnet-5
Just released!
Claude Sonnet 5
https://news.ycombinator.com/item?id=48736605
impressive to me, but sadly i feel a little misleading since this is only the data-science part of life sciences.
every few weeks though i test claude and chatgpt on their scientific reasoning and it has definitely improved over time. in my experience without specific instruction on what is known/unknown they typically are lagging behind the leading edge of the field (dev bio/pluripotency in my case). probably because scientific research articles are not open-source so they can't crawl them.
claude has definitely outperformed chatgpt in this regard however, it's scientific reasoning is impressive.
Big Pharama = Big Budgets.
So targeting them with a tailored product is understandable.
The fact that we are coming up on a month of Fable being unavailable with essentially zero actual signal from Anthropic around when it may be back is crazy to me. Yet still we have these random new products coming out?
https://xcancel.com/AnthropicAI/status/2070665903440871779
> Anthropic @AnthropicAI Jun 27, 2026 · 12:29 AM UTC
> Since June 12, we’ve been working closely with the US government to restore access to Claude Mythos 5 and Fable 5. Today, the government notified us that Mythos 5, our strongest cybersecurity model, can be redeployed to a set of US organizations that operate and defend critical infrastructure.
> We’re restoring access for these organizations quickly, and we’re continuing to work with the government to expand access to Mythos 5 and make Fable 5 available for general use again.
I mean the company has like 3k employees or more right? Lots of them are just working on more applied AI use cases that don't require frontier AI just the right integrations and structure etc.
Opus 4.8/ GPT 5.6 level models with the right workflows/ data/ access are still good enough to do huge amounts of economically valueable work.
Given the amount of AI slop that’s found its way into legal filings and other documents there’s little reason to think this won’t also just create a mess of scientific papers.
Furthermore, science isn’t suffering from a lack of papers. It’s suffering from a lack of good papers. Making it easier to just pump out paper-mill publications is about the last thing science needs right now.
Scientific research is suffering from a reproducibility crisis. Not a publication crisis. LLM's aren't going to solve reproducibility issues.
They are going to make it a thousands times worse.
It wasn't perfect before, but it at least took some time to fake a paper. The problem is now people can produce a very plausible looking completely fake paper in minutes. Peer review is in the process of completely collapsing, in fact I think it's already basically done.
The only way this might fix things is if we require all papers are completely reproducable (that doesn't help in subjects like biology of course. They can still provide all the experimental data in the rawest format possible which doesn't break any laws).
The two feed into each other. "Publish or perish" ups the incentive to pump out shaky papers to pad resumes. LLMs make it easier to churn them out.
it's suffering from having 1 million researchers, when there aren't 1 million important easy problems to solve, yet you must publish something
it could also be said that scientific interpretation is suffering from a framework crisis. the scientific convention of experiment, is the test of an hypothesis, as a logical construct.
repetition of materials and methods toward reproducibility, holds far less wieght than multiple variants of process designed to test a common hypothesis resulting in agreement.[null, or failure to null]
They're gonna worsen it
Isn't this just blanket cynicism?
In the long run conceivable we could use AI to hold papers to a much higher standard, audit all the data and code that is associated etc.
Por que no los dos? Scientific review times are up, it’s harder to find reviewers, and many reviews are AI generated anyway. Auto-generated research publications will arguably make the replication crisis worse, because there will be more slop to clog up the review system, and these papers will presumably be just as (if not more) not reproducible than human written science
In some fields like comp sci, when code isn't given but the paper describes the approach, LLMs do help with the reproducibility crisis: you can ask it to reproduce the result through reimplementation by reading the paper.
If it fails you may have to double check it did properly reimplement it, but if it succeeds you do get a reproduction.
Thought I'd give it a whirl - crashed immediately.
I was tickled they had a "Download for linux" button prominently shown, but nothing yet.
Weird that it runs as a local webserver rather than as an app
So I guess they released this instead of Sonnet 5?
Another overrated packaged workspace to drain more usage... No thank you.
Blog post: https://www.anthropic.com/news/claude-science-ai-workbench
maxed out on coding improvements so now they're trying to expand to other markets
Why have they talked about this for a long time? They predicted date of code maxing out, and did so not from fitting a sigmoid or something but they predicted it would max out right during a steep part of the slope?
Disappointing that science came after cowork. Shows how their priorities are for profitability first and help humanity second.
Now this... this is a hot take. How exactly do you expect these companies to "help humanity" if they're bleeding money?
this a great application for the sycophantic, non-deterministic lying machine!
How about no?
AI brand identity has made the unfortunate pivot to "how much do you trust us" which is going be a real race to the bottom. I don't want LLMs managing nuclear reactors or replacing junior lab technicians. I don't trust any of these LLMs to do the bare minimum, regardless of how good it is for your brand.
It's gross watching these stunts unfold. Next ChatGPT will fly a passenger jet, which Claude will one-up with an agentic surgery, which OpenAI will respond to by putting a humanoid robot on the moon. If this is what 21st century market competition looks like, we are all fucked.
Meanwhile in the real world, these Math Olympiad AIs can't even take your fast food order correctly.