The people at Netflix:
Several years ago we introduced a tool called Chaos Monkey. This service pseudo-randomly plucks a server from our production deployment on AWS and kills it.
Building on the success of Chaos Monkey, we looked at an extreme case of infrastructure failure. We built Chaos Kong, which doesn’t just kill a server. It kills an entire AWS Region.
As you can see on the chart above
us-east-1 takes over the load when
us-west-2 goes down. The aggregate metric at the top stays smooth, indicating that the failover works.