By now, it should be pretty clear that this is no coincidence. AI scrapers are getting more and more aggressive, and - since FOSS software relies on public collaboration, whereas private companies don't have that requirement - this is putting some extra burden on Open Source communities.
So let's try to get more details – going back to Drew's blogpost. According to Drew, LLM crawlers don't respect robots.txt requirements and include expensive endpoints like git blame, every page of every git log, and every commit in your repository. They do so using random User-Agents from tens of thousands of IP addresses, each one making no more than one HTTP request, trying to blend in with user traffic.
Due to this, it's hard to come off with a good set of mitigations. Drew says that several high-priority tasks have been delayed for weeks or months due to these interruptions, users have been occasionally affected (because it's hard to distinguish bots and humans), and - of course - this causes occasional outages of SourceHut.
Drew here does not distinguish between which AI companies are more or less respectful of robots.txt files, or more accurate in their user agent reporting; we'll be able to look more into that later.
Finally, Drew points out that this is not some isolated issue. He says,
All of my sysadmin friends are dealing with the same problems, [and] every time I sit down for beers or dinner to socialize with sysadmin friends it's not long before we're complaining about the bots. [...] The desperation in these conversations is palpable.
in life. most people in NYC have literally never experienced this one way or the other before NYC implemented it, and certainly aren't seeking out the kinds of spaces that would be partisan on it in some way. their opinions on this are accordingly malleable based on "does this feel good or bad," and you can see this in how there's already been a large change toward supporting congestion pricing as the benefits have become increasingly tangible: