Skip to content

Web crawlers powered by AI are relentlessly devouring website content in their unending quest for data.

Web Opinion: Yet, a potential solution could endanger its structure...

AI web crawlers indiscriminately devouring sites for ceaseless content consumption
AI web crawlers indiscriminately devouring sites for ceaseless content consumption

Web crawlers powered by AI are relentlessly devouring website content in their unending quest for data.

In the digital age, artificial intelligence (AI) has made its mark on various aspects, including web crawling. Here's a look at how AI web crawlers are shaping the internet landscape and the challenges they pose.

According to Cloudflare, a major content delivery network (CDN) force, 30% of global web traffic now originates from bots. Among these, AI data fetcher bots account for 80% of the traffic, as reported by Cloud services company Fastly.

These AI crawlers are more aggressive than standard ones, often disregarding crawl delays or bandwidth-saving guidelines. They can generate traffic spikes that reach up to twenty times normal levels within minutes, as Fastly warns. This surge can cause performance drops for sites on shared servers, even if they aren't being targeted for content.

Large websites are feeling the crush of AI bot traffic and must increase their processor, memory, and network resources to handle the load. For instance, even giants like Google and Meta, with their AI searchbots, can generate as much as 30 Terabits in a single surge, potentially damaging site performance.

AI crawlers can extract full page text and sometimes attempt to follow dynamic links or scripts. This can lead to performance degradation, service disruption, and increased operational costs, Fastly cautions.

To combat these issues, several measures are being taken. The open-source and free Anubis AI crawler blocker attempts to slow down visits from AI crawlers. Infrastructure providers like Cloudflare now offer default bot-blocking services to block AI crawlers. Efforts are also being made to supplement robots.txt with llms.txt files to provide LLM-friendly content.

However, not all AI crawlers respect the robots.txt files. Perplexity, an AI crawler, has been accused by Cloudflare of ignoring these files. Perplexity, however, denies this accusation.

The rise of AI web crawlers also raises concerns about the future of the open web. Important, accurate information may end up siloed behind walls or removed altogether. The web may become more like a pay-to-access system, similar to what is feared for the future.

Moreover, the web may become more fragmented due to businesses restricting or monetizing access to their sites. In Germany alone, AI web crawler traffic is estimated to be in the range of several petabytes per month.

As we navigate this new digital terrain, it's crucial to strike a balance between the benefits of AI and the preservation of the open, accessible web we've come to know.

Read also:

Latest