So this is a basically a shill advertisement ending in "Your AI Agents can avoid captchas if you pay us."
The last example is a false narrative, that captchas will only happen if the "browser looks suspicious". Systems like Altcha put an end to this argument. They don't care if the browser looks suspicious, only that the browser can perform a proof-of-work to get past a captcha designed to slow down the request rate.
When applied consistently, it will effectively block and slow down AI crawlers, which is what this company wants to promote.
Proof-of-work is bad rate limiting: https://news.ycombinator.com/item?id=44093918. The playing field is wildly unbalanced. Even naive attackers tend to have a lot more computing power available than a lot of your normal users, and where it’s SHA-256 (which is almost the worst choice imaginable for a proof of work scheme, yet which every single service that I know of has used), an intelligent attacker goes from being hundreds of times as powerful to millions of times as powerful.
>Systems like Altcha put an end to this argument. They don't care if the browser looks suspicious, only that the browser can perform a proof-of-work to get past a captcha designed to slow down the request rate.
That doesn't really work out in reality because bots are happy to wait 5 seconds or even 5 minutes for a PoW challenge to complete. Humans on the other hand will not, especially if they're on a mobile device with limited compute and energy.
The Chrome extension angle is interesting here. We ship an extension that interacts with Gmail and have seen how much variance there is in what Google considers "bot-like" behavior from extensions vs. the browser tab. The line between "automated" and "assisted" is not well defined at the API level, which ends up being a similar underlying problem: distinguishing intent rather than pattern.
The issue is that anything that becomes a standard here automatically becomes a target. If the same sort of captcha protects everything from Gmail to Twitter to Cloudflare and Facebook, then bot creators and spammers have a huge incentive to bypass it no matter what. And if we've learnt anything about spam, it's that pretty much every system we can think of can be bypassed or automated away.
The solution is really a ton of different captcha like systems and anti spam solutions, all unpopular enough that an attacker may not even bother targeting them. If an attacker needs to target a few thousand different captcha style setups to get their spam through, then many of them won't bother.
It's like centralised vs decentralised communication systems. If everything is centralised, a bad actor (like a government, corporation, criminal group, etc) can go after one target to control the narrative. If it's decentralised, then suddenly they have to go after dozens or hundreds of different targets, many of which won't cooperate with them.
I remember at one point in my teens, someone had made a web app that would snag the captcha and show you only the captcha, and you would just endlessly solve captchas, while the application tried different passwords on a backend, and logging any successful logins.
As TFA points out, a major change is that bot traffic now comes from honest users via their LLM sessions, so you don't even necessarily want to block automated bots anymore.
The game is shifting to a better ideal: how do you design a service knowing that any user/request might be automated?
Especially in place of the historical, easy solution/hack where you have some sort of gate that, once passed, puts the user in some trusted low-scrutiny tier, like a forum's registration page.
It's a similar question to designing a system so that it's resilient to account take-overs. (i.e. The user was a trusted human until now, and now it's a spammer)
Example: on a forum, run new posts through an LLM to classify it as spam which is a magic solution we always wish we had (remember akismet?) but was too rudimentary.
You use API tokens for things intended to be machine to machine communication and captchas for things intended to be filled out by humans. Not every site or service wants automated input, even if it's being directed by a human. I dont want forums like HN just filled with a bunch of agents talking to eachother, where's the human connection?
The most recent variations that force you to click the boxes containing a certain artifact are incredibly frustrating and fail half the time. The large influx of AI-SEO optimized content being created makes me question CAPTCHAs efficacy today
I haven't looked deeply into Web Bot Auth, but is identification tied to the agent (one identity per agent) or is it tied to the underlying person using the agent (the user)?
Hope that question makes sense, lmk if you need clarification
TLDR: They're promoting a product they're working on with Cloudfare under the guise of it being an "open standard" [1]. Of course, in the docs, Step 1 is "Sign in with your Cloudfare account". Comes across a bit land-grabby.
Omg. I am on various VPN’s and now and again Google Auth (for youtube) throws me a captcha. They are mostly unreadable, but there is an audio option… which is just insane and does not make any sense, anyone had that? It sounds like a recording of 300 people speaking at the same time in a call center while on various dosages of LSD
They can. They have already figured out a lot of what cloudflare is looking for and have figured out how to bypass it. (according to the article) Which is why protection is trying something else. I suppose this is why every website wants me to login with my google account (which I never use)
although not perfect for other reasons, a captcha made using phone motion and device attestation like prsn.you is a more challenging bypass for today’s agent environments
Oh my good I hate AI articles.
Why do we have to make an interactive visualization for every single sentence? Thanks for showing me how distorted text is made in steps.
And being a cat and mouse game doesn’t mean the defenders failed.
> And being a cat and mouse game doesn’t mean the defenders failed.
It does though, in the end attackers always win. If something is a "cat and mouse game" then it's unwinnable by design from the defender side.
Sure, you can keep playing it if you feel like it, but at some point the attacker will be indistinguishable from a legitimate user and you will lose that fight.
By that logic, every security task is doomed to fail. Spam detection and antivirus are cat and mouse games too. I wouldn’t say they fail just because they have to adapt over time.
They're great for keeping humans out. Tried to setup Discord on a new phone yesterday. CAPTCHAs over and over again, just trying to log in. I uninstalled instead.
They have been around that long ? Does not seem so but the timing could be correct probably because the sites I went to had no need for CAPTCHAs until AI came around.
Guestbooks, contact forms, signup pages, and the like started receiving automated abuse approximately five minutes after they were invented. It didn't take long after that for people to start including a question they expected to be easy for a person and hard to automate with a script.
What's relatively new is CAPTCHAs merely to browse a site. There are few faster ways to get me to close your site, and maybe send you an unfriendly email.
My first guestbook asked Hagar or Roth. Answering correctly got your message added to the book. Answering Hagar got you sent to an infinite redirect loop for being either a bot or a moron.
it sounds like the article & company are building identity based on fingerprinting/cross-domain behavior. Inferring at multiple levels, including cloudflare's
So this is a basically a shill advertisement ending in "Your AI Agents can avoid captchas if you pay us."
The last example is a false narrative, that captchas will only happen if the "browser looks suspicious". Systems like Altcha put an end to this argument. They don't care if the browser looks suspicious, only that the browser can perform a proof-of-work to get past a captcha designed to slow down the request rate.
When applied consistently, it will effectively block and slow down AI crawlers, which is what this company wants to promote.
More advanced and targeted bots can "bypass" Proof of work as well though, e.g. using something like https://github.com/toman-tom/Incapsula-PoW
Proof-of-work is bad rate limiting: https://news.ycombinator.com/item?id=44093918. The playing field is wildly unbalanced. Even naive attackers tend to have a lot more computing power available than a lot of your normal users, and where it’s SHA-256 (which is almost the worst choice imaginable for a proof of work scheme, yet which every single service that I know of has used), an intelligent attacker goes from being hundreds of times as powerful to millions of times as powerful.
>Systems like Altcha put an end to this argument. They don't care if the browser looks suspicious, only that the browser can perform a proof-of-work to get past a captcha designed to slow down the request rate.
That doesn't really work out in reality because bots are happy to wait 5 seconds or even 5 minutes for a PoW challenge to complete. Humans on the other hand will not, especially if they're on a mobile device with limited compute and energy.
The Chrome extension angle is interesting here. We ship an extension that interacts with Gmail and have seen how much variance there is in what Google considers "bot-like" behavior from extensions vs. the browser tab. The line between "automated" and "assisted" is not well defined at the API level, which ends up being a similar underlying problem: distinguishing intent rather than pattern.
The issue is that anything that becomes a standard here automatically becomes a target. If the same sort of captcha protects everything from Gmail to Twitter to Cloudflare and Facebook, then bot creators and spammers have a huge incentive to bypass it no matter what. And if we've learnt anything about spam, it's that pretty much every system we can think of can be bypassed or automated away.
The solution is really a ton of different captcha like systems and anti spam solutions, all unpopular enough that an attacker may not even bother targeting them. If an attacker needs to target a few thousand different captcha style setups to get their spam through, then many of them won't bother.
It's like centralised vs decentralised communication systems. If everything is centralised, a bad actor (like a government, corporation, criminal group, etc) can go after one target to control the narrative. If it's decentralised, then suddenly they have to go after dozens or hundreds of different targets, many of which won't cooperate with them.
I thought half the point of captchas was to train vision models?
This is in the article.
Indeed, half the point for reCAPTCHA: That how Google could justify supplying reCAPTCHA for free, but not why people wanted to use them.
> That how Google could justify supplying reCAPTCHA for free, but not why people wanted to use them
This and Pokemon Go for collecting videos: are there other examples of users doing the free work for $large_co?
I remember at one point in my teens, someone had made a web app that would snag the captcha and show you only the captcha, and you would just endlessly solve captchas, while the application tried different passwords on a backend, and logging any successful logins.
Some of the first bitcoin faucets in 2011, 2012 were bots doing that
Users thought the captcha was antispam prevention for them to receive bitcoin
It was really just the bot forwarding a captcha to continue its spam once solved, posting the user in bitcoin
As TFA points out, a major change is that bot traffic now comes from honest users via their LLM sessions, so you don't even necessarily want to block automated bots anymore.
The game is shifting to a better ideal: how do you design a service knowing that any user/request might be automated?
Especially in place of the historical, easy solution/hack where you have some sort of gate that, once passed, puts the user in some trusted low-scrutiny tier, like a forum's registration page.
It's a similar question to designing a system so that it's resilient to account take-overs. (i.e. The user was a trusted human until now, and now it's a spammer)
Example: on a forum, run new posts through an LLM to classify it as spam which is a magic solution we always wish we had (remember akismet?) but was too rudimentary.
You use API tokens for things intended to be machine to machine communication and captchas for things intended to be filled out by humans. Not every site or service wants automated input, even if it's being directed by a human. I dont want forums like HN just filled with a bunch of agents talking to eachother, where's the human connection?
What about those ones where you need to slide some piece of a puzzle in. I don't see those mentioned at all. Are they effective?
The most recent variations that force you to click the boxes containing a certain artifact are incredibly frustrating and fail half the time. The large influx of AI-SEO optimized content being created makes me question CAPTCHAs efficacy today
Really nice read Harsehaj!
I haven't looked deeply into Web Bot Auth, but is identification tied to the agent (one identity per agent) or is it tied to the underlying person using the agent (the user)?
Hope that question makes sense, lmk if you need clarification
Hey Matt,
I would say everyone is leaning towards organization/individual right now but I would image that flips as the number of agents grow
TLDR: They're promoting a product they're working on with Cloudfare under the guise of it being an "open standard" [1]. Of course, in the docs, Step 1 is "Sign in with your Cloudfare account". Comes across a bit land-grabby.
[1] https://www.browserbase.com/blog/cloudflare-browserbase-pion...
They have served to train multiple generations of ANN and ML algorithms, in that, I think they've been a resounding success!
Omg. I am on various VPN’s and now and again Google Auth (for youtube) throws me a captcha. They are mostly unreadable, but there is an audio option… which is just insane and does not make any sense, anyone had that? It sounds like a recording of 300 people speaking at the same time in a call center while on various dosages of LSD
I've actually been in a call center with 300 intoxicated folk all talking at once. Its easier to understand than the recaptcha audio.
(Only a couple folks on hallucinogenics, most on various downers.)
I've got captchas that made me play a small game and I score like 3 points to go ahead, lol. For real.
They give you that (or hieroglyphics) if you are using certain VPNs and don't leave a specific browser fingerprint.
There is a point where not leaving fingerprints becomes a fingerprint in itself.
Question that I've been wondering, can't attackers record human sessions and use it to attack a website to bypass cloudflare ?
They can. They have already figured out a lot of what cloudflare is looking for and have figured out how to bypass it. (according to the article) Which is why protection is trying something else. I suppose this is why every website wants me to login with my google account (which I never use)
Failed? They have very successfully pushed people towards chromium browser and traceable residential IPs while also training AI.
Always reminds me of the forces that shape the mechanisms around the exchange of genetic information that powers evolution.
See: Red Queen by Matt Ridley.
although not perfect for other reasons, a captcha made using phone motion and device attestation like prsn.you is a more challenging bypass for today’s agent environments
Just today a website presented me a qrcode captcha. I threw up.
Oh my good I hate AI articles. Why do we have to make an interactive visualization for every single sentence? Thanks for showing me how distorted text is made in steps.
And being a cat and mouse game doesn’t mean the defenders failed.
> And being a cat and mouse game doesn’t mean the defenders failed.
It does though, in the end attackers always win. If something is a "cat and mouse game" then it's unwinnable by design from the defender side.
Sure, you can keep playing it if you feel like it, but at some point the attacker will be indistinguishable from a legitimate user and you will lose that fight.
By that logic, every security task is doomed to fail. Spam detection and antivirus are cat and mouse games too. I wouldn’t say they fail just because they have to adapt over time.
They're great for keeping humans out. Tried to setup Discord on a new phone yesterday. CAPTCHAs over and over again, just trying to log in. I uninstalled instead.
It has failed because of these company like browserbase and hackers who hack smart device and TV's for residential proxy.
They have been around that long ? Does not seem so but the timing could be correct probably because the sites I went to had no need for CAPTCHAs until AI came around.
The name wasn't invented until 2003, but yes.
Guestbooks, contact forms, signup pages, and the like started receiving automated abuse approximately five minutes after they were invented. It didn't take long after that for people to start including a question they expected to be easy for a person and hard to automate with a script.
What's relatively new is CAPTCHAs merely to browse a site. There are few faster ways to get me to close your site, and maybe send you an unfriendly email.
My first guestbook asked Hagar or Roth. Answering correctly got your message added to the book. Answering Hagar got you sent to an infinite redirect loop for being either a bot or a moron.
So in the past few years? Oh dear, no. Captchas have been in common use for much longer than that. reCAPTCHA has been around almost 20 years.
They were introduced in 1997, although I personally didn't start seeing them until a couple of years later.
so whats the solution then? get people to turn on their camera and hold up 15 fingers ?
it sounds like the article & company are building identity based on fingerprinting/cross-domain behavior. Inferring at multiple levels, including cloudflare's
It's just more identity verification afaict
PACT: https://news.ycombinator.com/item?id=48647360
The solution is login and paywalls.
That's crazy. People aren't going to pay to be tracked and have ads shoved in their faces! The economy would collapse!