The tides have changed on resiliency of building applications that reside in the cloud. We were told for years that "Be Multi-Availability Zone" was the means to have resilient cloud apps. But, the outages that have hit every major Cloud Service Provider ("CSP") recently show that it isn't always a strategy if your aim is extremely high availablity.
So, we need to think bigger. But, this comes at increased cost and increased complexity. The fact is, there just aren't a whole lot of organizations doing multi-region deployments—let alone ones talking about it. This series hopes to assist in filling that gap.
We decided to author a series of blog posts on how to build resilient cloud applications that can span multiple regions in AWS, specifically AWS GovCloud. Our goal here is uptime to both push data to and retrieve data from a cloud application. This series will touch on several things which we are focusing on, building Web Apps and Web APIs that process data.
Most of our applications use several AWS core technologies (listed below). We have made a concerted effort to migrate to pure Platform as a Service ("PaaS") where we can. We want to avoid IaaS totally, as it requires additional management of resources. We can't tell you how all of this will work with Lift and Shift, as our engineering is centered around using cloud native services.
The goal for us and the reason for the cloud is, let someone else do the hard work. For our cloud based solutions, we do not use Kubernetes ("k8s"), at all. We find the overhead to be too cumbersome when we can allow AWS to do all the management for us. When we cut over to edge computing k8s becomes a viable solution.
At a high level, we use the following services to build and deliver applications:
- AWS Lambda and/or AWS ECS Fargate for compute
- AWS DynamoDB for data storage (Global Tables)
- AWS S3 for object storage
- AWS Kinesis + AWS S3 for long term logging of applications to comply with DoD SRG and FedRAMP logging
Now, there are a lot of applications that may need to use more services. Things like Athena or QuickSite maybe necessary, but we consider (at least for the solutions we are building) for those to be ancillary services to the core applications. For instance, in these applications if you can't get to QuickSite to visualize some data for an hour—its not that big of a deal (at least for this solution). But, if you can't log data from the field real time, that is a big deal.