Terraform should have
remained stateless

Published May 28, 2022 by Ricard Bejarano

Before you read…

This post has been superseded by my revised take on stateless Terraform, and is kept only for archive purposes.

Please read the newer post instead.

The other day I was looking into why Terraform needs state.
I found this↗ page in Terraform’s official docs.

It promises to explain why “state is required” and why stateless Terraform “would require shifting massive amounts of complexity from one place to another”.

I’m not sold on the documents’ reasoning though, here’s why.

Mapping to the real world… 1% of the time

The first reason Terraform gives in order to justify the use of state is that it needs a way to map resources in configuration to resources in the provider.

They explain how “Terraform could use tags” but then again “not all providers have tags”. Therefore, state.

I get the feeling that few other options have been explored in mapping resources in a stateless manner. For example:

// map on one attribute:
resource "aws_instance" "foo" {
  name = "foo" @id
}

// composite primary key:
resource "aws_instance" "bar" {
  name = "bar" @id
  tags = {
    environment = "production" @id
  }
}

// only an axample, not necessarily my best proposal

If an instance with the same attribute values does not exist, create it.
If one instance exists, do nothing. If multiple exist, fail to plan.

This, unfortunately, does not apply to all providers and resources. Some don’t have any arbitrary string attribute for Terraform to map with.

But major provider API resources all have at least one attribute that meets all the requirements: mainstream AWS stuff, everything in Kubernetes, local provider resources, etc.

Point being, are we adding state because 1% of the resources couldn’t be mapped otherwise? If I don’t need it, can I skip it?

Metadata… is not a hard requirement

The second reason stated in the document is storing metadata “such as resource dependencies”. Specifically “when you delete a resource from Terraform configuration, Terraform must know how to delete it”.

But… depends_on already exists. This allows declaring dependencies between any two resources, whatever the provider.

I’m not sure why “the complexity for this approach quickly explodes”, according to the document. Feel free to counterexample me.

Performance… is optional

This is admittedly optional, according to the document.

Syncing… is not a problem

This is a problem caused by state, not a problem solved by state.
Not sure why the document includes it.

Why is state a problem?

When using Terraform, we have:

Terraform configuration (or, how we want the universe to be);
Terraform state (or, how Terraform sees the universe); and
the provider APIs state (or, how the universe really is).

We already know what we want and what we have.
I can’t help but think number 2 is… pointless!

Besides, there’s only so much Terraform can do to keep its picture of the universe up-to-date. This has tons of implications:

resources created out-of-Terraform, have to be imported into state;
if deleted, plan fails and humans must remove them out of state; and
if modified, behavior varies by changes, resource and provider! 🤯

Also, if Terraform configuration is refactored, for example, to wrap a bunch of frequently copy-pasted resources into a module, state must be manually reconciled before proceeding.

State makes Terraform painful to maintain.

Additionally, requiring state creates a chicken-and-egg problem, in that you can’t use Terraform to create the state backend that will hold Terraform’s state. This is anecdotal, but ugly nonetheless.

Is there precedent in stateless tools?

Yes!

Ansible↗, Puppet↗, etc. don’t have intermediate stores of the hosts’ configuration, but then again they are used for different things.

OctoDNS↗, on the other hand, is the perfect example:

absolutely stateless, just a Git repository and some CI code;
you can cron it to ensure “what you have” is “what you want”;
you can plan before apply just like Terraform; etc.

I don’t see any reason why Terraform couldn’t work just like OctoDNS.

Should Terraform be stateless?

I’m leaning towards yes.

Am I asking for a stateless Terraform rework? Definitely not.
If there’s anything worse than state, its changing APIs.

A more sensible option could be implementing a “none” state backend that pulls state right from providers when planning.

I am totally unaware of what the implications of adding such backend would be, which is why I’ve carefully picked my words throughout this post, so that my points don’t come across as facts, but rather as questions I get from using Terraform.

What do you think?

Thanks for dropping by!

Did you find what you were looking for?
Let me know if you didn't.

Have a great day!

Terraform should haveremained stateless