Cloudflare Outage: Causes and Future Prevention Measures
On November 18, 2025, Cloudflare experienced a major outage affecting numerous online services, including ChatGPT, X, and Downdetector. The company described this incident as the "worst since 2019" and attributed it to a flaw in their Bot Management system.
The issue stemmed from an incorrect database query configuration in ClickHouse, which is responsible for generating the configuration file for the Bot Management machine learning model. A change in query behavior led to a surge of duplicate data, causing the configuration file to grow rapidly and exceed memory limits.
Consequently, this disrupted the primary proxy system that processes client traffic reliant on the bot module. Clients using the generated bot metrics in their rules began blocking legitimate traffic, while those not using this feature remained online.
Cloudflare clarified that the problem was not related to DNS issues, attacks, or new generative AI systems; the error lay within the internal logic of updating the Bot Management configuration.
To prevent similar incidents in the future, the company announced four key steps:
- enhanced handling of configuration files and user input;
- expansion of global "kill switches" for functionalities;
- preventing situations where core dumps or error reports could overwhelm the system;
- reviewing all failover modes in critical proxy modules.
According to Cloudflare's estimates, about 20% of internet traffic passes through their network, making any error in central modules capable of causing global repercussions.