Advanced Terraform
performance optimization

Published Mar 4, 2026 by Ricard Bejarano

Last year I wrote Speeding up Terraform caching with OverlayFS, which I then presented at SREcon↗, HashiConf↗ and SREday↗. As part of the introduction piece of my talk, I laid out all the Terraform performance optimization techniques we knew about so far, at each step of its initplanapply cycle. Here’s the full landscape:

At each of those conferences, this was the slide most people asked me to share after the talk, and the one that had more people pulling out their phones to take pictures.

This post is that slide, in written form.

Introduction

The following is a set of advanced Terraform performance optimization techniques, mostly relevant to those pushing it to its limits. As such, I’m going to omit all the Terraform preamble and expect you to know its ins and outs. I’m also going to exclude the basics, like giving Terraform a faster host and faster network interconnections with its providers’ APIs.

Beware, some of these are highly experimental and can even be dangerous, so proceed with caution when applying them to production settings.

I’ve highlighted my generally recommended solutions.

The init phase

The expensive steps of the init phase are downloading modules and providers.

Fetching modules

In this step, Terraform downloads all remote modules referenced in your configuration, making one relatively heavy (1–10MB on average) network request per module. This is fine in small enough configurations, but when you grow to hundreds or thousands of remote modules, inits can become quite slow.

Here’s what you can do about it:

  1. Switch to git submodules. While it might look counterintuitive—you’re trading Terraform module fetches for git submodule fetches, why would it make a major difference?—it can work if your CI/CD platform supports caching git submodules such that only new changes made to them are pulled upon every run. At that point, you’re exchanging performance for the complexities of managing git submodules, which may be a reasonable trade-off for you.

  2. Switch to local modules. This can work, you’re trading several smaller children module fetches for a fatter root module download before you init—so the total volume remains about the same—but if your CI/CD system supports repository caching, it will only pull its changes in between runs. Granted, you’re trading performance for other features, such as module versioning and isolated module CI validation, which could be a reasonable trade-off, but they often are of critical necessity at scale.

  3. Terraform module cache. Similar to Terraform’s built-in provider cache, a module cache would allow Terraform to only fetch changes during init. This would be the perfect solution to this problem, however, Terraform doesn’t yet include a built-in module cache. Here’s the feature request↗. Until that lands, module fetching remains a bottleneck with no clean solution.

Fetching providers

In this step of the init phase, Terraform downloads all provider binaries referenced by your configuration and state. If modules were “relatively heavy” at 1–10MB each, provider binaries are massive: the AWS provider alone is over 750MB!

With a handful of providers (and versions, each version is its own binary) you’re downloading several gigabytes per terraform init. This is, by far, the most expensive step of the init phase.

Here’s what you can do about it:

  1. Consolidate provider versions. If different modules pin different versions of the same provider, you’re downloading several, mostly equivalent binaries, every init run. Aligning on a single version (or at most one per major provider version) can greatly reduce the total cost of this step. The impact of doing this is small, but it’s good operational hygiene to do it anyway.

  2. Break down into multiple, smaller root modules. Monolithic root modules tend to grow into tens or hundreds of providers. Splitting into smaller root modules, if done properly, can help spread the stack of providers needed for each of them thinner. Doing so also enables parallel operations on the resources managed by each root module, which can help with other unrelated productivity issues. Splitting, however, carries its own set of complexities with it. Tracking dependencies, calculating plan & apply order, enforcing consistency, enabling infrastructure-wide changes… are all problems Terraform solves for you when using a monolithic root module, and that you need to solve yourself if you split off into smaller pieces. Something to keep in mind if you go down this route.

  3. Terraform provider cache. Terraform’s built-in plugin cache↗ downloads each provider version once and reuses it across inits, but it’s not concurrent-safe. Two concurrent terraform init processes writing to the same cache directory can corrupt it, so you either disable caching or serialize all inits; neither is great past a certain scale.

  4. Concurrent-safe Terraform provider cache with OverlayFS. The topic of my blog post and talks. Use OverlayFS↗ to give each concurrent init the illusion of a shared cache, while redirecting writes to a local layer to avoid corruption. After init, sync new providers back to the central cache. You get the speed of a cache with none of the concurrency problems. Until Terraform’s built-in provider cache is made to be concurrent-safe, this is your best bet. Detailed walk-through here.

The plan phase

The most expensive step of the plan phase is refreshing state.

Refreshing state

Terraform state is a snapshot of reality, taken the last time you performed a Terraform operation. The pre-plan refresh queries every resource against the provider API to detect drift—one network request per resource. While each individual API call is relatively lightweight (<1MB), with enough resources, or slow enough providers, this can take hours. I’ve waited for up to 7 hours for a one-change plan.

  1. Increase parallelism (and provider rate-limits). Faster and safe. Crank them up until you get rate-limited by the provider API. Beyond that, it’s out of your control.

  2. Disable -refresh. The fastest option: skip the pre-plan refresh entirely. This, in a void, is unsafe: if a resource changed outside of Terraform, your plan won’t know, and it may propose dangerous changes. You can somewhat make up for the lack of pre-plan refreshes with scheduled full refreshes (overnight, for example), but it remains a race and a consistency risk. Tread with caution.

  3. Light planning. The topic of an earlier blog post of mine. A proposal to add a terraform plan -light flag that only refreshes resources whose code has changed since last run. Those resources become the plan’s -target, so only those (and their dependents) get refreshed and planned. Faster than a full refresh, safer than -refresh=false. Here are the feature requests in Terraform↗ and OpenTofu↗.

  4. Break down into multiple, smaller root modules. Splitting a monolithic root module into smaller ones reduces the scope of every operation down to a subset of what it was before. This works, yes, and it may solve some other sociotechnical problems you may face when using a monolithic root module and state, but I can’t overstate how much complexity breaking out a monolithic root module into smaller pieces causes. Be mindful of the implications of going down this path before you do so. As we saw in an even earlier blog post of mine, rolling back is almost not an option.

The apply phase

Finally, the most expensive step of the apply phase is actually applying the planned changes. Unfortunately (and again, other than improving your connection to providers’ APIs) all you can do is to work less, to make less of them.


That’s all. This is not supposed to be a complete list, I’m still looking for ways to improve Terraform’s performance, but I’m running out of ideas. If there’s anything you think deserves a spot in this list, let me know and I’ll check it out.

Hope this helped!

Thanks for dropping by!

Did you find what you were looking for?
Let me know if you didn't.

Have a great day!