Cloud Out Loud Podcast

Episode 17 - Disaster

August 19, 2022 Jon and Logan Gallagher
Episode 17 - Disaster
Cloud Out Loud Podcast
More Info
Cloud Out Loud Podcast
Episode 17 - Disaster
Aug 19, 2022
Jon and Logan Gallagher

Disaster 


Episode 17: Show Notes


We here in the Pacific Northwest have been experiencing a slow-moving disaster of enormous proportions. Namely, the heatwave that much of Europe has also recently been confronted with. Similarly, in the context of the cloud, there has also been a pretty significant disaster. On 19 July 2022, there was a well-documented outage, where the London data center for Google experienced the simultaneous failure of multiple cooling systems. This impacted multiple Google Cloud services, which had a series of significant consequences for users worldwide. In the past, there have been multiple natural disasters that have affected cloud computing, and these events are only likely to become more frequent in the future. From global warming to inflation, there is a range of global factors that will impact how we use the cloud. In today’s episode, we’ll be covering how to prepare for these eventualities, and how to have proper defenses in place. Our conversation covers how to guard the systems we already have, the structures that we're building, and how to implement cloud technology optimally. We also discuss the concept of chaos engineering and how Netflix has implemented it to create resilient applications. There’s a lot to unpack here, so make sure you tune in for all the relevant details on how to prepare for the future without feeling overwhelmed!


Key Points From This Episode:


  • Introducing today’s topic: Disaster.
  • An overview of the heat waves that the northern hemisphere has been experiencing.
  • How this heat wave facilitated an outage at the Google data center in London.
  • The failure of several cooling systems and the impact this had on multiple Google Cloud Services.
  • How past natural disasters have affected cloud services and what we predict for the future.
  • Why the London outage was unexpected and why the next event will likely also be unanticipated.
  • How mobility can help you prepare for disasters in the US.
  • How to implement load balancing between regions.
  • The role of planning and building to prepare for potential natural disasters.
  • Why emulation is crucial to be fully prepared.
  • A breakdown of how companies can practice their disaster recovery policy.
  • The concept of chaos engineering and how it ensures resilience.
  • How Netflix has implemented chaos engineering to make their applications extra resilient.
  • Why exploring these areas of vulnerability takes tremendous commitment.
  • How to apply these lessons to your own business.
  • An overview of the tools that companies can leverage to ensure resilience.


Links Mentioned in Today’s Episode:


Netflix

Chaos Monkey on Github

Janitor Monkey on Github

Configuration Monkey on Github

Spinnaker

Jon Gallagher on LinkedIn

Logan Gallagher on LinkedIn

Show Notes

Disaster 


Episode 17: Show Notes


We here in the Pacific Northwest have been experiencing a slow-moving disaster of enormous proportions. Namely, the heatwave that much of Europe has also recently been confronted with. Similarly, in the context of the cloud, there has also been a pretty significant disaster. On 19 July 2022, there was a well-documented outage, where the London data center for Google experienced the simultaneous failure of multiple cooling systems. This impacted multiple Google Cloud services, which had a series of significant consequences for users worldwide. In the past, there have been multiple natural disasters that have affected cloud computing, and these events are only likely to become more frequent in the future. From global warming to inflation, there is a range of global factors that will impact how we use the cloud. In today’s episode, we’ll be covering how to prepare for these eventualities, and how to have proper defenses in place. Our conversation covers how to guard the systems we already have, the structures that we're building, and how to implement cloud technology optimally. We also discuss the concept of chaos engineering and how Netflix has implemented it to create resilient applications. There’s a lot to unpack here, so make sure you tune in for all the relevant details on how to prepare for the future without feeling overwhelmed!


Key Points From This Episode:


  • Introducing today’s topic: Disaster.
  • An overview of the heat waves that the northern hemisphere has been experiencing.
  • How this heat wave facilitated an outage at the Google data center in London.
  • The failure of several cooling systems and the impact this had on multiple Google Cloud Services.
  • How past natural disasters have affected cloud services and what we predict for the future.
  • Why the London outage was unexpected and why the next event will likely also be unanticipated.
  • How mobility can help you prepare for disasters in the US.
  • How to implement load balancing between regions.
  • The role of planning and building to prepare for potential natural disasters.
  • Why emulation is crucial to be fully prepared.
  • A breakdown of how companies can practice their disaster recovery policy.
  • The concept of chaos engineering and how it ensures resilience.
  • How Netflix has implemented chaos engineering to make their applications extra resilient.
  • Why exploring these areas of vulnerability takes tremendous commitment.
  • How to apply these lessons to your own business.
  • An overview of the tools that companies can leverage to ensure resilience.


Links Mentioned in Today’s Episode:


Netflix

Chaos Monkey on Github

Janitor Monkey on Github

Configuration Monkey on Github

Spinnaker

Jon Gallagher on LinkedIn

Logan Gallagher on LinkedIn