Death by a thousand cuts: the AI scraper indexing one blog post at a time
16 September 2025
Like many online-publishers/bloggers, I’ve experienced significant surges of traffic caused by AI bots indexing — or whatever they do — thousands of pages at a time on my website.
I’m in two minds as to whether or not to block this activity, but it seems pointless as many crawlers disregard disallow requests. Besides, I can’t stop other entities, human or otherwise, accessing the content here, and doing what they will with it.
Once, way back in 2000, someone in New Zealand copied the entirely of the then disassociated website, republished it under the name disenfranchised or something, and called it their own work. I didn’t discover the reproduction by chance though. The responsible party emailed to tell me about it.
I wrote back (effectively) saying they should design their own website. disenfranchised, or whatever it was, vanished a few weeks later. I think they hoped I would write ceaselessly about the “rip-off” of my work, but I when I said no more, they found something else to do.
I know there are ways to make copying the contents of a website difficult, but anyone sufficiently motivated will figure out how to bypass those mechanisms.
At least someone liked what I did enough to want to copy it. I highly doubt though any crawlers gathering data for AI agents care whether what I do here is likeable or not. But what annoys me is the way the activities of this scraper are distorting my web analytics (not Google) data.
Yes, you can help yourself to the content here, just don’t mess with my web stats.
Of course, I know web analytics are by no means an exacting science, but they do highlight trends. Somehow my morning online routine would not be the same if I decided to ditch analytics. Besides my stats app holds near on twenty-years worth of data, so there is also the history aspect.
To complicate matters, the scraper uses a different IP address on every single visit, meaning I can’t simply add an ignore tag to one IP, or a range, to keep visits off the analytics app data.
Subsequently, their visits appear to originate from a different town/city, but in the same country (a populous nation in east Asia). There is also no rhyme or reason to the maybe twenty to thirty pages they visit daily. One minute it is a years old post, the next something far more recent.
As the crawler did not snatch up several thousand post in one fell swoop, it will doubtless be active for sometime to come. In the meantime I’ll make the most of thinking my website is ever so slightly more popular than usual, since there’s not much else to do.
RELATED CONTENT
