Automated Multi-Region deployments in AWS

The tides have changed on resiliency of building applications that reside in the cloud. We were told for years that "Be Multi-Availability Zone" was the means to have resilient cloud apps. But, the outages that have hit every major Cloud Service Provider ("CSP") recently show that it isn't always a strategy if your aim is extremely high availablity.

So, we need to think bigger. But, this comes at increased cost and increased complexity. The fact is, there just aren't a whole lot of organizations doing multi-region deployments—let alone ones talking about it. This series hopes to assist in filling that gap.

We decided to author a series of blog posts on how to build resilient cloud applications that can span multiple regions in AWS, specifically AWS GovCloud. Our goal here is uptime to both push data to and retrieve data from a cloud application. This series will touch on several things which we are focusing on, building Web Apps and Web APIs that process data.

Most of our applications use several AWS core technologies (listed below). We have made a concerted effort to migrate to pure Platform as a Service ("PaaS") where we can. We want to avoid IaaS totally, as it requires additional management of resources. We can't tell you how all of this will work with Lift and Shift, as our engineering is centered around using cloud native services.

The goal for us and the reason for the cloud is, let someone else do the hard work. For our cloud based solutions, we do not use Kubernetes ("k8s"), at all. We find the overhead to be too cumbersome when we can allow AWS to do all the management for us. When we cut over to edge computing k8s becomes a viable solution.

At a high level, we use the following services to build and deliver applications:

  • AWS Lambda and/or AWS ECS Fargate for compute
  • AWS DynamoDB for data storage (Global Tables)
  • AWS S3 for object storage
  • AWS Kinesis + AWS S3 for long term logging of applications to comply with DoD SRG and FedRAMP logging

Now, there are a lot of applications that may need to use more services. Things like Athena or QuickSite maybe necessary, but we consider (at least for the solutions we are building) for those to be ancillary services to the core applications. For instance, in these applications if you can't get to QuickSite to visualize some data for an hour—its not that big of a deal (at least for this solution). But, if you can't log data from the field real time, that is a big deal.

Building Blocks

All of the topics we touch on will be laid out in further blog posts (and we'll update the links here as we publish them). But, at the core of everything we will be doing is leveraging AWS CodePipeline to build, test, and deploy solutions.

We use CodePipeline for our IaC deployments as well as our code deployments. Our intent is to remove humans from the equation (which, again, is the whole point of CI/CD, removing or mitigating human error).

AWS Organizations

To accomplish this we follow AWS Best Practices to build and deploy AWS environments. This requires using AWS Organizations.

We have a series of core templates we will lay down in a new AWS Organization. First, we must establish the following OU Hierarchy:

  • Root OU <-- Single Root Account
  • DSOP OU <-- Delegated CloudFormation Admin (Single Account)
  • Logging OU (Single Account)
  • Security OU <-- Delegated CloudFormation Admin (Single Account)
  • Application OU
    • Production OU
      * App 1 OU
      * App 2 OU
    • Test OU

How we have architected this is rather straight forward. We want to be able to build any number of applications. We also want to be able to deploy segregated tenants for those applications that may need them.

To solve this, we create the root Application OU that is the root account for our cloud based apps. From there, we bifurcate a Production OU and Test OU. Finally, within our Production OU we create OUs for each application we are building.

With this strategy, we can apply Stack Sets to each relevant OU. Let's say we are building a Web API that returns weather data. Under the Production OU we'd create the Weather App OU and apply a StackSet to that OU. We create a new AWS Account and add it to the Weather App OU and the application automatically deploys and is configured. Now, assume we have a customer that wants a dedicated tenant of the Weather App, we simply create a new account for that customer and add it to the Weather App OU. Again, it automatically deploys the entire application, top to bottom. It really is that simple.

As we iterate and update code using CodePipeline for the Weather App, the deployments update those StackSets for IaC and Compute, automatically deploying the changes to the Weather App OU. Thus, we have to touch nothing other than approving the deployments.

Note: You will still want to monitor your StackSets to ensure that the deployment was successful.

Best of all, after we've done the work to make the app Cross Region, it deploys to all applicable regions without human intervention.

Moral of the story: once you do the hard work, global and redundant deployments aren't impossible. They are hard to get working the first go, but once you have the blocks in place, as Carl Weathers said in Arrested Development, "Baby you've got a stew going."

Next Up

Next up in this series will be:

  • Part 1: Intro
  • Part 2: Gotchas
  • Part 3: DynamoDB Global Tables
  • Part 4: AWS Lambda (Pending)
  • Part 5: S3 Replication (Pending)
  • Part 6: AWS Fargate (Pending)
  • Part 7: AWS CodePipeline (Pending)