The recent outage at Cloudflare, which disrupted services for major platforms like ChatGPT and X (formerly Twitter), serves as a stark reminder of the fragility of our digital ecosystem. When a giant of internet infrastructure stumbles, the ripples are felt globally.
While the disruption caused frustration for millions of users, it offers a valuable lesson for business leaders and IT decision-makers: in the world of technology, there is no such thing as a guaranteed 100% uptime.
Even the Giants Fall
Cloudflare is renowned for its robustness and reliability. They provide the shield that protects countless websites from attacks and the engine that accelerates internet traffic. Yet, on November 18, 2025, a latent bug triggered by a routine configuration change caused a significant service failure.
It is important to note that this was not a cyber attack. It was an operational error. This highlights a critical reality: risks do not only come from malicious hackers. Complexity in systems, human error, and software bugs are just as capable of taking your business offline as a dedicated adversary.
If a technology titan with billions of dollars in resources and some of the world’s best engineers can experience downtime, it is safe to assume that your organisation could face similar challenges.
The Fallacy of the Single Solution
Many businesses operate under the assumption that if they purchase a service from a top-tier vendor, their uptime is guaranteed. Service Level Agreements (SLAs) often promise 99.9% or even 99.99% availability. However, that small fraction of a percentage where things go wrong can translate to hours of downtime, lost revenue, and reputational damage.
Relying on a single vendor for a mission-critical function creates a single point of failure. If your “Plan A” is “rely on Cloudflare” (or AWS, or Microsoft), and that vendor has an outage, you are effectively helpless until they resolve the issue.
Building Resilience: Plan A, Plan B, and Plan C
For organisations where uptime is critical—such as financial institutions, healthcare providers, or high-volume e-commerce platforms—reliance on a single path is rarely sufficient. True resilience requires a multi-layered approach.
Plan A: Your Primary Defence This is your standard operating state. You utilise best-in-class vendors and robust architecture to handle day-to-day operations. For most, this works 99% of the time.
Plan B: Technical Redundancy What happens when Plan A fails? Technical redundancy might involve a multi-cloud strategy or a secondary failover system. For example, if your primary Content Delivery Network (CDN) fails, can your traffic automatically reroute to a backup provider? If your primary data centre goes dark, is there a “hot standby” ready to take the load? Implementing these redundancies adds cost and complexity, but it is the only way to technically mitigate a total vendor outage.
Plan C: Business Continuity Sometimes, technology fails completely. A widespread internet outage or a catastrophic failure might render both Plan A and Plan B useless. This is where your Business Continuity Plan (BCP) is vital. Plan C focuses on people and processes.
- How do you communicate with customers if your email server is hosted by the vendor that is down?
- Can your staff revert to manual processing for critical transactions?
- Do you have an off-network method of contacting your crisis management team?
Moving Forward
The goal of cybersecurity and IT resilience is not to achieve the impossible standard of perfection, but to prepare for the inevitable imperfection of technology.
The Cloudflare incident is not a reason to lose faith in cloud providers, but rather a prompt to re-evaluate your dependency on them. By assuming that every component of your infrastructure will eventually fail, you can design systems that are resilient enough to weather the storm.
If you are concerned about your organisation’s reliance on single vendors or would like assistance in developing a robust Business Continuity Plan, contact the team at Vertex. We can help you assess your resilience and ensure you have a strategy in place for when the lights go out.