Chaos Kong

kong_in_progress_small

The people at Netflix:

Several years ago we introduced a tool called Chaos Monkey. This service pseudo-randomly plucks a server from our production deployment on AWS and kills it.

Building on the success of Chaos Monkey, we looked at an extreme case of infrastructure failure. We built Chaos Kong, which doesn’t just kill a server. It kills an entire AWS Region.

As you can see on the chart above us-east-1 takes over the load when us-west-2 goes down. The aggregate metric at the top stays smooth, indicating that the failover works.

Chaos Engineering Upgraded →

(via @mattiasgeniar)

The Instagram Server Stack

This is how our system has evolved in the just-over-1-year that we’ve been live, and while there are parts we’re always re-working, this is a glimpse of how a startup with a small engineering team can scale to our 14 million+ users in a little over a year.

What Powers Instagram: Hundreds of Instances, Dozens of Technologies →