Moving Your Fleet from AWS to AZURE with Terraform

In the series of Terraform posts we have shown how to effectively utilize Infrastructure as Code to build, deploy, scale, and monitor a fleet of infrastructure across both AWS and Azure. The beauty of Terraform is that we can leverage providers to execute the entire deployment in a consistent way across clouds regardless of the particular constructs. The particulars (API interactions, exposing resources, dependency mapping, etc.) are taken care of by Terraform and the providers themselves.

To recap this is what we have covered so far:

Now that we have our fleet in both AWS and Azure, let’s move between them.

Moving Fleet between AWS and Azure

Below are the respective fleets in both AWS and Azure.

havaawsdev.png
HavaIOAzure.png

Having presence in both we will need to redirect our traffic to our cloud of choice. This is easily done via DNS. The Terraform output from our AWS and Azure deployments via Terraform provides us with the public facing DNS names for each of the respective environments. These are the same DNS names that we have used to validate our deployments in each of the given clouds during the previous steps.

AWSDNSName.png
AzureDNSName.png

We can then log into our DNS provider/management service (mine happens to be with Hover) and create three CNAME records. azure, aws and www. The domain name I will be using to access our fleet is couchtocloud.com

DNS.png

The aws and azure records are not necessary but I like to be able to browse to them directly for troubleshooting if necessary.

HelloMultiCloud.png

The www record can then be pointed at which ever cloud you want to traffic to be directed to, and modified to point to a different cloud when needed. We can now dictate which cloud we want to send traffic to with a simple DNS update. For a simple form of load balancing between the clouds you can create two www records, one pointing to aws and the other to azure. Requests will then round robin between the two clouds.

HelloMultiCloudWWW.png

Wrapping Up

And that is a wrap for this series where we showed how to use Terraform to build out environments both in AWS and Azure and move between the two. Terraform is extremely powerful and I further encourage you to learn more on how it can be used for enabling you to safely and predictably create, change, and improve infrastructure.

Building the Fleet in Azure with Terraform

In the series of Terraform posts we have shown how to effectively utilize Infrastructure as Code to build, deploy, scale, monitor and destroy a fleet of infrastructure across multiple regions in AWS. The beauty of Terraform is that while we may have used it to build out infrastructure in AWS, we can also extend it’s use to other cloud providers as well. As I see more and more organizations adopting a multi-cloud strategy, let’s take a look at what it would take to deploy our fleet into Azure.

Azure Specifics

If you are familiar with AWS, Azure provides many similar services and features. The Azure Terraform provider is used to interact with many of the Azure resources supported by Azure Resource Manager (AzureRM). A brief overview of the Azure resources will will utilize to move our fleet to Azure are:

Azure Authentication: Terraform supports authenticating to Azure through a Service Principal or the Azure CLI. A Service Principal is an application within Azure Active Directory whose authentication tokens can be used as the client_id, client_secret, and tenant_id fields needed by Terraform. Full details to create a service principle are well documented on the Terraform website.

Resource Group: Azure holds related resources for a given solution in a logical container called a Resource Group. You cannot deploy resources into Azure without assigning them to a Resource Group which we will create and manage via the Terraform Azure provider.

Virtual Network: Akin to a AWS VPC, Azure’s Virtual Network provides an isolated, private environment in the cloud. It is here where we will define our IP address range, subnets route tables and network gateways. This build will utilize the Azure network module maintained in the Terraform module registry.

Scalability: In order to scale our fleet to the appropriate size, Azure provides Azure’s Virtual Machine Scaling Set (VMSS). AWSS is similar to AWS Auto Scaling allowing us to create and manage a group of identical, load balanced, and autoscaling VMs. The fleet will be front ended by a load balancer so that we can grow/shrink without disruption and will utilize VMSS module up on my GitHub terraform_azure repository.

Deploy to Azure

For our initial Azure deployment will will create a new set of Terraform files, including a new main.tf to tie together the details of the Azure provider, modules and specifics for how we want to the fleet built. Inside the file we have declared our connection to Azure, the resource group to build, the virtual network details as well as the web server cluster. The VMSS module referenced also builds a jump server/bastion server in the event that you need to connect to the environment to do some troubleshooting. I have specified my Azure credentials as environment variables so that they are not included in this file.

All files to create this fleet in Azure including the main.tf, output.tf and VMSS module are available in the terraform_azure repository of my GitHub account.

We can initialize, plan and apply our deployment using Terraform and immediately see our Azure resources being built out through the Azure Portal and inside our devtest resource group.

azurefleet.png
azureapplycomplete.png

Once the deployment is complete, we browse to the DNS name assigned to to the load balancer front ending our Azure VMSS group. This address displayed at the end of the Terraform output, as we included an output.tf file to list relevant information.

Browsing to the DNS name we can validate our deployment is now completed. At this time we can check the health of the deployment - remember there is a jump server that is accessible if needed.

hellofromAzure.png

Once you are happy with the state of the new fleet in Azure it can be torn down with a terraform destroy. I recommend doing this as we prepare for the next step in the series: Moving the Fleet from AWS to Azure.

This is part of a Terraform series in which we have covered:

Remembering to Clean Up with Terraform

One of my favorite uses of Terraform is to quickly turn up an infrastructure environment with only a few lines of code. Of equal importance is the ability to tear down parts of the environment when they are no longer needed or need to be rebuilt. Terraform helps me leverage elasticity both in building, destroying and rebuilding as necessary.

Reminders

If you are like me, you tend to forget things and need reminders. I have been building out environments now for some time in an automated way, but I am not always the best at remembering to tear them down when I am done. Don’t get me wrong, the act of tearing things down is easy with commands like terraform destroy, but remembering to do so is where I have a gap.

To close that gap I wanted to create a monitoring and trigger mechanism that would remind me when my infrastructure is running idle, and to go clean it up. Since many of my deployments are in AWS, the two tools I will leverage to accomplish this are CloudWatch and SNS. For those not familiar, CloudWatch is a monitoring and management service provided by Amazon that provides operational metrics on the health of a given environment. SNS is a notification service that allows you to send messages to a variety of endpoints - including SMS text messages which is a great way to remind me of doing things.

Incorporating Monitoring into My Build

Defining CloudWatch and SNS is relatively easy in Terraform as both resources can be defined using the Terraform AWS provider. Examples for both can be found on the Terraform website, and I have folded them both into a module I created on GitHub.

We will use these resources to monitor when the our autoscaling group goes idle, which I define as less then 2% CPU every minute for 5 minutes. When that occurs send a text message to the supplied phone number. To keep it simple the module accepts both the autoscaling group to monitor and the phone number to send messages to as variables. There is nothing preventing us from also defining the thresholds and polling intervals as variables as well, and in fact is something that we should probably do in the future to make the module more robust.

Using the Cloud-Watch Module

To make use of this module, we simply need to edit the main.tf file we have been using in development to include the cloud-watch module, which we will call from GitHub. We will pass the name of the auto scaling group created within the webserver_cluster module as an input for monitoring and prompt for the phone number to send the alert message to.

Now when we deploy our fleet there will be a two cloud watch alarms created against the deployed auto-scaling group. One that will report on idle time in a 5 minute window, and the other reporting on idle time in a 5 hour window. The idea being that if I missed one text message, I will get the second so that I can perform a terraform destroy to tear down the environment when it is not being utilized.

Now that I have included the cloud-watch module to my development main.tf file let’s initialize (terraform init), plan (terraform plan), and deploy (terraform apply).

Notification and Clean UP

I can see that it successfully created my alarm in CloudWatch and tied it to the auto-scaling group it created when deploying the fleet.

Output from running a terraform apply, listing the DNS name and autoscaling group of the sever fleet.

Output from running a terraform apply, listing the DNS name and autoscaling group of the sever fleet.

CloudWatch Alarm - Two were created, one for 5 minute intervals and the other for 5 hour intervals.

CloudWatch Alarm - Two were created, one for 5 minute intervals and the other for 5 hour intervals.

Now when the environment goes idle, an alarm will trigger and send me a text message. Should I not take care of it at that time, another text message in 5 hours will be send should the environment remain idle.

Text Message from AWS SNS notifying me that my auto-scaling group has had idle CPU for the last 5 minutes.

Text Message from AWS SNS notifying me that my auto-scaling group has had idle CPU for the last 5 minutes.

Since Terraform makes it easy to cleanup (terraform destroy), I will be sure to perform that step to not incur costs for unused assets and environments. Terraform destroy will be sure to cleanup not only the environment it deployed but also the alarms and SNS notifications it created during buildout.

This is part of a Terraform series in which we have covered:

Using Terraform to Up Your Automation Game: Multi-Environment/Multi-Region

In the last two posts we have been working with Terraform to automate the buildout of our Virtual Private Cloud (VPC) in AWS and deploy a fleet of scalable web infrastructure.  Our next step is to complete the infrastructure build so that it is identical across our development/test, staging & production environments.  Each of these environments reside in different regions: development (us-west-2), staging (us-east-1) and production (us-east-2).

File Layout

In the case of deploying our development environment, our work is really close to being complete.  We already have a working set of code that deploys into the us-west-2 region, which is contained in the main.tf and outputs.tf files. We will add one step, and that is to create a folder structure that allows our code to easily be managed and referenced.  At the top level we create a separate folder for each of our environments, and within each of these folders a sub-folder specifying the type of fleet we are deploying .  In the case of our development web server deployment, the folder structure is simple and looks like this:

. ├── dev │
   └── webservers │ 
   ├── main.tf │      
   ├── outputs.tf

There are a number of benefits with laying out our files in this manner - including isolation, re-use and managing state - which Yevgeniy Brikman does an excellent job of describing.  As Brikman indicates - "This file layout makes it easy to browse the code and understand exactly what components are deployed in each environment. It also provides a good amount of isolation between environments and between components within an environment, ensuring that if something goes wrong, the damage is contained as much as possible to just one small part of your entire infrastructure."

Deploy Development

Now that our file layout is the way we want it, let's deploy development.  Very simple, and something we have done a few times now.  Since we moved our two files into the a new folder location, we will need to initialize the deployment (terraform init), plan it (terraform plan) and finally deploy (terraform apply).

terraforminit_dev.png
terraform_plan_webservers.png
terraform_apply.png

Once complete we can browse over to see the development deployment.

webserver_dev.png

Deploy Staging

One of the most powerful benefits of deploying our infrastructure as code in a modular way is reusability.  In fact, to build out our staging environment is only a matter of a couple of steps.  First we will create a staging folder in which to store our files, and then we will copy over our main.tf and output.tf files.  We will then make a few edits to the main.tf including the following updates:  region, IP address space, tags, ami, cluster name, cluster size and key_names.  Looking at the differences between development and staging is as simple as running a compare between the main.tf files in the dev and staging folders.  The differences are highlighted below:

dev_staging_compare.png

Once we are happy with the updates, the sequence to deploy is exactly what we are use to.  This time we will run our initialize, plan and deployment from within the staging folder.  Once complete we can browse over to see the staging deployment.

webserver_stage.png

Production Deployment

Our production environment will mimic our staging environment with only a few edits including deployment to the us-east-2 region, and will start with 8 webservers in the fleet and scale as needed.  Once again leveraging infrastructure as code we will simply copy the main.tf and output.tf files out of staging and make our edits.  The differences between staging and production are highlighted below:

staging_prod_compare.png

Now we use the power of Terraform to deploy, and Viola...production is LIVE.

webserver_prod.png

Visualizing Our Deployment

Now that our 3 deployments are complete, we can see the folder structure that has been build out maintaining a separate state for each environment.

directorytree.png

To document and visualize our build out, I like to use hava.io which builds architecture diagrams for AWS and Azure environments.  As you can see all three environments are active and we can drill into any of them to see the details, including pricing estimates - production ($115/month), staging ($91/month), dev ($72/month).

my3enviornments.png
prod_diagram.png

Mission Complete

Our mission was to create and deploy a set of auto-scaling web servers, front ended by a load balancer for our development, staging and production environments across 3 different AWS regions.  Through the power of infrastructure as code we utilized Terraform to define, plan and automate a consistent set of deployments. Mission Complete.

Terraform Series

This is part of a Terraform series in which we cover: