Remembering to Clean Up with Terraform

One of my favorite uses of Terraform is to quickly turn up an infrastructure environment with only a few lines of code. Of equal importance is the ability to tear down parts of the environment when they are no longer needed or need to be rebuilt. Terraform helps me leverage elasticity both in building, destroying and rebuilding as necessary.

Reminders

If you are like me, you tend to forget things and need reminders. I have been building out environments now for some time in an automated way, but I am not always the best at remembering to tear them down when I am done. Don’t get me wrong, the act of tearing things down is easy with commands like terraform destroy, but remembering to do so is where I have a gap.

To close that gap I wanted to create a monitoring and trigger mechanism that would remind me when my infrastructure is running idle, and to go clean it up. Since many of my deployments are in AWS, the two tools I will leverage to accomplish this are CloudWatch and SNS. For those not familiar, CloudWatch is a monitoring and management service provided by Amazon that provides operational metrics on the health of a given environment. SNS is a notification service that allows you to send messages to a variety of endpoints - including SMS text messages which is a great way to remind me of doing things.

Incorporating Monitoring into My Build

Defining CloudWatch and SNS is relatively easy in Terraform as both resources can be defined using the Terraform AWS provider. Examples for both can be found on the Terraform website, and I have folded them both into a module I created on GitHub.

We will use these resources to monitor when the our autoscaling group goes idle, which I define as less then 2% CPU every minute for 5 minutes. When that occurs send a text message to the supplied phone number. To keep it simple the module accepts both the autoscaling group to monitor and the phone number to send messages to as variables. There is nothing preventing us from also defining the thresholds and polling intervals as variables as well, and in fact is something that we should probably do in the future to make the module more robust.

Using the Cloud-Watch Module

To make use of this module, we simply need to edit the main.tf file we have been using in development to include the cloud-watch module, which we will call from GitHub. We will pass the name of the auto scaling group created within the webserver_cluster module as an input for monitoring and prompt for the phone number to send the alert message to.

Now when we deploy our fleet there will be a two cloud watch alarms created against the deployed auto-scaling group. One that will report on idle time in a 5 minute window, and the other reporting on idle time in a 5 hour window. The idea being that if I missed one text message, I will get the second so that I can perform a terraform destroy to tear down the environment when it is not being utilized.

Now that I have included the cloud-watch module to my development main.tf file let’s initialize (terraform init), plan (terraform plan), and deploy (terraform apply).

Notification and Clean UP

I can see that it successfully created my alarm in CloudWatch and tied it to the auto-scaling group it created when deploying the fleet.

Output from running a terraform apply, listing the DNS name and autoscaling group of the sever fleet.

Output from running a terraform apply, listing the DNS name and autoscaling group of the sever fleet.

CloudWatch Alarm - Two were created, one for 5 minute intervals and the other for 5 hour intervals.

CloudWatch Alarm - Two were created, one for 5 minute intervals and the other for 5 hour intervals.

Now when the environment goes idle, an alarm will trigger and send me a text message. Should I not take care of it at that time, another text message in 5 hours will be send should the environment remain idle.

Text Message from AWS SNS notifying me that my auto-scaling group has had idle CPU for the last 5 minutes.

Text Message from AWS SNS notifying me that my auto-scaling group has had idle CPU for the last 5 minutes.

Since Terraform makes it easy to cleanup (terraform destroy), I will be sure to perform that step to not incur costs for unused assets and environments. Terraform destroy will be sure to cleanup not only the environment it deployed but also the alarms and SNS notifications it created during buildout.

This is part of a Terraform series in which we have covered: