On November 18, 2025, the Internet had one of its most widespread outages in years, taking down major platforms and disrupting millions of websites, reminding the world how much of the web relies on just a few pieces of infrastructure. At the center of it all was Cloudflare, one of the world’s largest CDN and security firms, which afterward confirmed that a minor internal configuration change brought along big unintended consequences, which affected roughly 20% of global web traffic.
In a detailed postmortem, Cloudflare explained what triggered the failure, how things spread so rapidly, and what steps the company is taking to prevent a similar incident in the future.
What Exactly Went Wrong?
Cloudflare traced the entire outage back to a problem inside its Bot Management system, a platform that uses machine-learning models in order to detect and filter malicious traffic. This system depends upon a so-called “feature file” containing the parameters and signals required to make decisions on bot-scoring. Usually, this file is compact and updated regularly without incident.
But on the morning of the outage, a change in database permissions inside ClickHouse—the columnar database that powers signals for Bot Management—caused duplicate “feature rows” to be generated. These duplicate entries quickly inflated the size of the feature file, pushing it far beyond the limits the proxy software was designed to handle.
When this oversized configuration file propagated across Cloudflare’s global network, it caused proxy processes to crash. Since Cloudflare’s proxy software is at the heart of its entire infrastructure, the failures cascaded quickly, affecting everything from website delivery through to APIs and security services.
How Big Was the Impact?
While Cloudflare didn’t release exact traffic numbers, the company reiterated that its network handles more than 20% of all Internet requests. As the corrupted file spread, millions of sites and services delivered through Cloudflare began returning HTTP 5xx server errors.
The impact was wide-ranging:
- Major websites suffered through sporadic or complete downtime.
- Turnstile, Cloudflare’s anti-bot challenge system, stopped loading
- Dashboard logins, which rely on Turnstile, failed for many users
- Workers KV, a popular storage solution, saw elevated errors
- Email security systems lost access to an IP-reputation source, reducing spam-filter accuracy.
The failure was so sudden and far-reaching that Cloudflare first suspected a massive DDoS attack before tracing the root cause to the malformed configuration file.
How Cloudflare Responded
Once the cause was identified, Cloudflare’s engineering teams initiated a three-step recovery plan:
- Stop propagation of the corrupted feature file
- Roll back to the most recent known-good version
- Restart core proxy systems across all data centers.
The recovery was gradual. It wasn’t until 14:30 UTC that Cloudflare reported core traffic was “flowing normally” once again, though full restoration of all systems didn’t occur until 17:06 UTC on the same day.
Cloudflare CEO Matthew Prince published a public apology for what he described as the company’s “most serious outage since 2019.” He acknowledged the huge disruption caused to the wider Internet ecosystem.

Why a small file caused a huge outage
The incident speaks to a more profound architectural reality: that modern Internet infrastructure is deeply interconnected, and configuration systems—especially those that are distributed globally—can become single points of failure. A few key lessons emerged from Cloudflare’s analysis:
- Dependence on Centralized Infrastructure
A huge part of the internet depends on Cloudflare: when a misconfiguration goes global, so do the ripples. - The Danger of Silent Database Changes
The bug came from an update on permissions that allowed duplicate rows in ClickHouse. Because this wasn’t caught early, the corrupted data reached systems further down the line. - Lack of Guardrails on File Size
The proxy software expected a maximum file size and, not having implemented any fallback mechanism, hard-crashed instead of degrading gracefully. - The Need for Safer Rollouts
The problem might have been contained before it spread had the feature file been gradually deployed through canary testing or size validation checks.
A Wake-Up Call for the Whole Internet
While users appreciated Cloudflare’s clear explanation of the outage, what happened does pose some very interesting questions for the industry in light of the sheer volume of traffic across a handful of providers, including Cloudflare, AWS, Google Cloud, and Akamai. However, its engineers have moved swiftly to patch the underlying issue, put in additional layers of validation, and build other safeguards for the next configuration updates. The November 2025 outage will likely be studied for years as a textbook example of how a small internal change can trigger large-scale Internet disruption—and why resilience, redundancy, and fail-safes are more essential than ever in global infrastructure.
