
As traffic moves through the core proxy, it applies each customer’s unique configuration and settings, from enforcing WAF rules and DDoS protection to routing traffic to the Developer Platform and R2. These configuration and policy rules are enforced by domain-specific modules. One of those modules, Bot Management, resulted in the outage.
The Bot Management module uses a machine-learning model to assign bot scores to every request. The model relies on a feature configuration file, which is a collection of traits that help determine whether a request is automated. This feature file is refreshed every few minutes, allowing Cloudflare to react to new types of bots and new bot attacks.
A change in ClickHouse query behaviour that generates this file created large numbers of duplicate feature rows, increasing the size of the previously fixed-size feature configuration file, causing the bots module to trigger an error. As a result, Cloudflare’s core proxy began returning HTTP 5xx errors for any traffic relying on the bots module. Services like Workers KV and Access were also impacted, as they depend on the same core layer.
Once the underlying issue was identified, Cloudflare halted the generation and propagation of the bad feature file and manually inserted a known good file into the feature file distribution queue, and then forced a restart of its core proxy.
“This is like the butterfly effect in modern interconnected systems. A butterfly can flap its wings to fly, and its wing-flapping is the basis for the butterfly effect, a concept in chaos theory where a tiny change can have massive, unpredictable consequences. Same thing we are seeing here,” said Pareekh Jain, CEO at EIIRTrend & Pareekh Consulting.
Jain noted that such deep dependencies between security layers and AI modules for traffic routing are now common in large-scale cloud environments. Integrating powerful, dynamic modules directly into request processing pipelines means a flaw anywhere in the path can disrupt everything downstream.




















