Taming unruly Terraform – Alteryx

Terraform has gained widespread popularity since being first released in 2014, and for a good reason. It is a fantastic open-source tool that allows you to manage and automate infrastructure changes as code across all popular cloud providers.

At Alteryx Auto Insights, we use Terraform to manage our cloud environments. Since last year our product has been expanding from Australia and Singapore to the United States and the United Kingdom. This growth raised the challenge of managing environments across regions.

If you are already using Terraform, especially if you are managing many slightly-different environments or regions, you may already know the topic of this post. However, if you don’t know, this tool may answer all your recent Terraform woes.

A short story

Once upon a time, in a cloud region not too far away, a simple VM hosting a web app is deployed from a single Terraform module.

After gaining popularity, two more VMs, a managed database server and an application gateway are added.

Realising the single Terraform module has become too large, the developer splits the module into smaller modules, each with distinct input variables and backend configuration. Each module might depend on other modules, outputs of one module might be inputs to another.

Soon, the developer wants to deploy a staging environment for the infrastructure, where they can stage changes before reaching production.

Now, the developer has multiple Terraform modules, each with a separate backend configuration and a separate set of input variables for each environment, which can be a nightmare to orchestrate.

We were in a similar situation.

A possible solution

The developer could create a parent Terraform module to orchestrate all the services as submodules.

Not all services are needed in staging, so the developer introduces this pattern into the parent module.

This solution could work. The services have one backend configuration and one set of input variables per environment. Orchestration seems pretty straightforward, except there are a few downsides.

The single backend state makes it hard to split the configuration in the future, and loss or corruption of the state will break all services.
Chaining modules often leads to “var.foobar is a string, known only after apply” errors.
Many count ternaries for each environment add complexity.
Changes to one service mean applying all services – it’s all or nothing.
Lastly, applying different input variables per environment also adds complexity.

A better way

Another solution could be to use Terragrunt.

Terragrunt is an open-source tool that acts as a thin wrapper around Terraform. Terragrunt aims to simplify the configuration and orchestration of Terraform modules and to keep the configuration DRY (Don’t Repeat Yourself).

So how does Terragrunt solve the issues above?

A new layer of configuration

Terragrunt introduces a new layer to your configuration. You will keep your existing modules as they are and create a new directory for each environment in an environment directory with a subdirectory for each service module.

Within each environment directory, you can specify modules to deploy. Within each of these folders will be a terragrunt.hcl file used to configure the module. These .hcl files are in HCL 2 format, the same format Terraform uses.

What you will end up with will look a little like this. If this looks like a lot of extra configuration, bear with me.

├── environments
│   ├── prd
│   │   ├── vm
│   │   │   └── terragrunt.hcl
│   │   ├── database-server
│   │   │   └── terragrunt.hcl
│   │   ├── application-gateway
│   │   │   └── terragrunt.hcl
│   │   └── terragrunt.hcl
│   └── stg
│       ├── vm
│       │   └── terragrunt.hcl
│       ├── database-server
│       │   └── terragrunt.hcl
│       ├── application-gateway
│       │   └── terragrunt.hcl
│       └── terragrunt.hcl
└── modules
    ├── vm
    │   └── main.tf
    ├── database-server
    │   └── main.tf
    └── application-gateway
        └── main.tf

A shared backend config

The first helpful thing that Terragrunt provides is a way to template a backend configuration for each module in a single location. This is done using the remote_state feature of Terragrunt, which can be added to environments/prd/terragrunt.hcl file and will be applied automatically to all modules underneath prd.

You only have to specify the backend config once, and each module will still have its own state file.

Terragrunt allows you to use the built-in Terraform and Terragrunt functions for this templating. Below are examples for each cloud provider I use:

get_terragrunt_dir() - Get the full path to the module's directory. For example, /path/to/repo/environments/prd/vm.
get_parent_terragrunt_dir() – Get the full path to the top-level terragrunt.hcl file. For example, /path/to/repo/environments/prd
basename(...) – Get the last portion of a directory path. For example, vm.

The examples below show how you can do this in Azure, AWS S3 and Google Cloud Storage. The if_exists = "overwrite_terragrunt" tells Terragrunt that if a backend.tf has already been generated by Terragrunt for the module, it can overwrite it. It will raise an exception if a backend.tf exists that Terragrunt did not generate.

AzureRM

remote_state {
  backend = "azurerm"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    storage_account_name = "tfstate"
    resource_group_name  = "tfstate"
    container_name       = basename(get_parent_terragrunt_dir())
    key                  = "${basename(get_terragrunt_dir())}.tfstate"
  }
}

S3

remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket         = "tfstate"
    key            = "${basename(get_parent_terragrunt_dir())}/${basename(get_terragrunt_dir())}.tfstate"
    region         = "us-east-1"
  }
}

Google Cloud Storage

remote_state {
  backend = "gcs"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket = "tfstate"
    prefix = "${basename(get_parent_terragrunt_dir())}/${basename(get_terragrunt_dir())}.tfstate"
  }
}

The result is that each module would have a unique state file, but the configuration is only specified once per environment.

Defining the module config

In each module’s terragrunt.hcl in the environment, you would add an include block to ensure the remote_state config is applied from the parent terragrunt.hcl.

You would also add a terraform block to tell Terragrunt where to find the module relative to the terragrunt.hcl. The source can also be a git repository path. The double-slash // is intentional and required. You can read why in their documentation: Keep your Terraform code DRY – Working locally.

include {
  path = find_in_parent_folders()
}

terraform {
  source = "../../..//modules/${basename(get_terragrunt_dir())}"
}

Next, we will discuss how to add inputs to the modules and chain modules together.

Chaining modules and inputs

You can chain modules together using the dependency block and the output values of other Terraform modules. Terragrunt will ensure that modules are applied in order of their dependencies.

In the inputs block, you will specify the inputs to the module for the environment, including the outputs of dependencies.

You can also mock dependency outputs for dependencies, which allows you to perform a plan for modules, even if their dependencies have not been applied. mock_outputs_allowed_terraform_commands ensures that the mock variables are not applied.

Below is an example of how these blocks might be configured for the application gateway's terragrunt.hcl in stg. The vm module is a dependency and its output vm_id is passed in as an input.

dependency vm {
  config_path = "../vm"

  mock_outputs = {
    vm_id = "/subscriptions/000000000-0000-0000-0000-000000000000/resourceGroups/fake-rg/providers/Microsoft.KeyVault/vaults/fake-vm"
  }

  mock_outputs_allowed_terraform_commands = ["validate", "refresh", "init", "plan"]
}

inputs = {
  vm_id = dependency.vm.outputs.vm_id
  sku_name = "Standard_Small"
}

common variables

Within the terraform block of the module and the environment’s terragrunt.hcl files, you can also specify an extra_arguments config, where you can set input .tfvar files. This feature is handy for sharing variables between modules.

You may have a common.tfvars file in environments/stg/vars containing variables that should be input to all modules in stg.

environment = "stg"
location    = "us-east-1"

In the top-level terragrunt.hcl for stg, you will specify common.tfvars as a required variable. This config is imported relative to the child module, so each path in required_var_files should be relative to the child module.

terraform {
  extra_arguments "common" {
    commands = get_terraform_commands_that_need_vars()

    required_var_files = [
    "${get_terragrunt_dir()}/../vars/common.tfvars"]
  }
}

More extra_arguments can be added to the child modules for variables shared between modules. However, further extra_arguments configs need to be named differently to avoid overriding previous ones.

Running Terragrunt

Running all modules at once

Instead of deploying each module individually, you can let Terragrunt do all the work.

To apply all modules in stg, from the stg directory, you will run terragrunt run-all plan, and terragrunt run-all apply. Terragrunt will automatically perform init for you. A catch with running apply like this is that Terragrunt auto-approves the changes rather than prompting at the end.

Terragrunt will also show you the modules it will apply in order.

INFO[0009] The stack at /path/to/repo/environments/stg will be processed in the following order for command plan:
Group 1
- Module /path/to/repo/environments/stg/vm
- Module /path/to/repo/environments/stg/database-server

Group 2
- Module /path/to/repo/environments/stg/application-gateway

You can deploy specific modules using the -terragrunt-exclude-dir and -terragrunt-include-dir flags. You can find out more in their documentation: Terragrunt CLI options.

Running individual modules

If you would like to run a single module without its dependencies, you can run a terragrunt plan, and a terragrunt apply from the module’s directory in the target environment. If the dependencies have not been applied, Terragrunt will fail to apply the module but will use the mocks for the plan.

If you want to run an individual module and its dependencies, you can run terragrunt run-all plan, and terragrunt run-all apply. Terragrunt will prompt you on which dependencies will need to be run.

Module /path/to/repo/environments/stg/application-gateway depends on module /path/to/repo/environments/stg/vm, 
which is an external dependency outside of the current working directory. 
Should Terragrunt run this external dependency? 
Warning, if you say 'yes', Terragrunt will make changes in /path/to/repo/environments/stg/vm as well! (y/n)

Outcome

Looking at the issues with the first solution, you will see that the Terragrunt solution solves all these issues plus more.

Each module has a backend state file configured in a single location.
You can chain and plan modules together with dependency blocks and mocks, even when they have not been applied.
Modules that are not needed in an environment can be easily excluded.
Modules can be applied all at once, in groups or individually.
Inputs are easy to specify for each module. They can even be easily shared between modules.

Terragrunt has many other features, such as hooks, terraform generation and auto-retry.

Is Terragrunt for you?

If you feel that your Terraform setup is still relatively simple and you aren’t deploying the same modules to multiple regions or environments, you probably don’t need Terragrunt. Terragrunt adds the most value once your Terraform gets complex, you are starting to deploy the same modules across different environments, and you need a way to chain many larger modules together.