The subtle art of bot fighting

Posted on June 22, 2026

Caroline C. Blaker

The subtle art of bot fighting Image

Strategies for stealing information from secure websites are trending up in number

Changing tools, despondent economies, hit-or-miss efforts - there's signs of slow-down and caution everywhere - everywhere except where isolated individuals with a computer apparently seek to mine passwords, money, information, or [checks notes..] visceral aggravation from independent free-standing websites. What are they even doing? They appear to be playing a low-odds numbers game: it succeeds if even once over 200,000* tries they glean pretty much any ol'thing. This isn't the only source of bot overwhelm on websites - we're seeing scraping from conscripted Amazon servers, and heavy AI bot traffic arising from surprising locations all over the globe. 

(* this figure is arbitrary. I have no idea what the limit might be. It might even be time based, in automation. "Can I get one working username/password in 5 days? how about 10..")

And we wouldn't necessarily spend any time here, if it didn't inconvenience us so completely. Craft CMS and ExpressionEngine websites aren't targeted for hacks the way WordPress websites are. Their entrypoints are generally hidden, they're not as vulnerable to security lapses over time. In short, the websites we work with don't get hacked - but hackers still try, and leave evidence. And what hackers are doing, besides looking for WordPress entrypoints, is starting to surprise us.

We're seeing AI crawls take up so many server resources (pages over time) that the server gives up. We're seeing hobbyists (hackers) look for website backups in the document root, where they do the same thing as the AI bots, but hundreds of 404 errors. We see wanna-be scrapers trying to lift entire websites. Without the ability to hack a website, they are looking for information, sometimes secret information, and overwhelming the server in the process. They are attacks in the sense of taking advantage of what they can find, and in the sense that they want your server to deny service, but they are not strictly hacks.

With tools like Cloudflare, we have been able to stop these attacks at the DNS (Domain Name Service) level, before they even hit the website. And while we leave these rules in place in case they ever come back, novel attempts to find things or get in always seem to crop up. If your website isn't in WordPress, you're ahead of this game; because WordPress being the #1 Content Management System by deployed website count - with its history of security vulnerabilities represents the biggest/easiest target. (See How Hackers Target A WordPress Website) But now we're getting heavy AI crawls, and seeing scraping, and attempts at file paths that we've never seen, from previously unsuspected sources. 

Good SEO might be the problem

With the changes to Google Search Engine Results, we are noticing that the better the SEO, the more reliant the website for that information/keyword, the more AI performs intermediary work in delivering that information from scraping your website and delivering it to the user before they even reach your website. This isn't all bad - we've gotten used to Search Engines doing so much heavy lifting for us as marketers that our compliance with SEO no longer going the distance can feel like abandonment. The value we are finding in this transition is that old-fashioned marketing is coming back around again. (Show up offline, communicate well & appropriately, be yourself, be of value) And AI still does sometimes lead to traffic when you're the best of the best. Sometimes this traffic is users, and sometimes it's looking for WordPress entrypoints, or sometimes it's looking for (as we experienced this week,) archive files containing copies of the website or its database, "infrastructure," (whatever it thinks that is) or any file ending with ".bin", ".pckg", ".tar", ".zip", ".sql" and more. 

We were able to observe this behavior with the Hop 404 Reporter add-on for ExpressionEngine and see quite clearly that an offshore user was looking to take advantage of any major website information left unscrupulously in the domain root. This client is one of our top performers in SEO and with the benefits, they appear to get the most environmental flack for their talent of all the websites we work on.

Does this happen to me?

You may not know if your website is down or up unless you go there to check - but we recommend using uptime monitoring, so that you get an email or a text message if your website goes down. If you're our client, this is already in place. 

Interested in follow-up information?

This content began as a newsletter, but grew to be too big. Join our growing list if additional updates around information like this could be useful to you.

Handling Bad Actors

Unfortunately, waiting until it goes away is not an effective option. Often your website is what goes away before the attack stops. Frustrating? Yes! So much so that industries have developed around this problem. You may have to block different types of entities:

- IP addresses - if an IP address or two is the source of the attack, block and move on.

- Strange Crawlers - if you don't recognize the name of the crawling entity reported by Cloudflare that's overwhelming your traffic, you can block it by name.

- Entire Countries - There's always one country we block first - because it comes up first. I'm not saying which one. But this is an option, especially if your business doesn't work there.

- Rate Limit AI bots - Since AI bots are now embedded with search, caution should prevail before blocking these entirely - but they can be rate limited - 100 pages per hour, or whatever your server can handle.

- Update server resources - If you have a popular website, it makes sense to be able to handle more traffic (or "spikes") in case of sudden demand. This capability can be a useful buffer to keep your site online whether the demand is the good kind or not.