My Journey Automating Proxmox Infrastructure with Terraform

Around 6 years ago, when I started my homelab journey with my first Dell T610, I was looking for a server OS that would be used in enterprise settings and found two options that stood out: Vmware ESXi and Proxmox. Although there was a lot of documentation around ESXi I had a lot of problems setting it up due to relatively old hardware of the T610.

Then I started reading through Proxmox's docs and immediately felt familiar, building on proven technologies with which I was already familiar: a Debian-based system, LVM or ZFS for flexible storage management etc. Even more so, its support for LXC containers made deploying and testing much easier. I do wish they incorporate Docker as well, though. :)

The platform struck an excellent balance for me - enterprise-grade capabilities wrapped in an accessible interface.

But there was one drawback: imagine recreating the same LXC container for the tenth time in the same day, clicking through identical wizard screens, changing just one or two parameters each time. Of course, there was always the option to create templates, but I still found myself playing a memory game of 'Which settings did I use for the logging container again?' after a few days away from the lab.

A few months later, a colleague pointed me towards Terraform and I thought that a great way to learn it is to help me automate my Proxmox workflows.

Researching, I found the telmate/proxmox provider which offered a way to manage both LXC container as well as virtual machines, although the latter was a lot more fragiale.

From long scripts to modules: Creating a Reusable Proxmox Solution

My initial approach into using Terraform for Proxmox management was to start with simple scripts to create the containers. Each script defined the complete configuration for a specific container - from network settings to mount points. While this approach worked, I soon noticed a pattern emerging: large blocks of nearly identical configuration scattered across different files, with only small variations in values like container names or IP addresses.

This code duplication wasn't just making my Terraform configurations unnecessarily verbose - it was also making them harder to maintain. When I needed to change a common setting, like the default DNS server or the base template, I had to update it in multiple places. I knew there had to be a better way.

Terraform modules provided the solution I was looking for. Modules in Terraform work much like functions in traditional programming: they encapsulate logic and allow you to reuse it with different parameters. And with my programming background it just made sense to me. This approach would let me define the core LXC container configuration once and then reuse it across all my projects, passing in just the unique values that each container needed.

With this goal in mind, I created a Proxmox LXC module that abstracts away the common configuration patterns I found myself repeatedly using. The module, which can be found here https://github.com/rendler-denis/tf-proxmox-mod, transforms what would typically be long files filled with repeating resource definitions into a clean, parameterized interface for creating containers.

Using the module

Using the module is simple. Just add a module block with the source as my Github repo into the root module and provide values to its optiosn like this:

module "pve_lxc" {
  source = "github.com/rendler-denis/tf-proxmox-mod//lxc?ref=0.1.0"

  for_each = var.lxc

  ct_name      = each.key
  target       = each.value.target
  template     = each.value.template
  privileged   = each.value.privileged
  onboot       = each.value.onboot
  protected    = each.value.protected
  cpu_cores    = each.value.cpu_cores
  memory       = each.value.memory
  swap         = each.value.swap
  vmid         = each.value.vmid
  state        = each.value.state
  ssh_keys     = each.value.ssh_keys
  root_pass    = each.value.root_pass
  tags         = each.value.tags
  hdd_size     = each.value.hdd_size
  storage_name = each.value.storage_name
  net          = each.value.net

  proxmox_ssh = var.proxmox_ssh
}

While the module supports extensive customization of container settings it currently handles a single network interface per container. This limitation is based on my own usage patterns, where the containers operate with a straightforward networking setup.

Flexible configurations with sensible defaults

To reduce variable duplication I used the principle of "convention over configuration" developing this module - providing sensible defaults while maintaining the flexibility to customize when needed. The pattern combines default options with container-specific overrides, like this:

locals {
  default_lxc = {
    storage_name = "local-zfs",
    template     = "local:vztmpl/rockylinux-9-default_20221109_amd64.tar.xz",
    onboot       = true,
    privileged   = false,
    protected    = false,
    root_pass    = "admin123",
    state        = true,
    target       = "pvey"
    ssh_keys     = null,
    tags         = "linux,rockylinux,9"
  }
}

module "pve_lxc" {
  source = "github.com/rendler-denis/tf-proxmox-mod//lxc?ref=1.0.0"

  for_each = { for name, config in var.lxc : name => merge(local.default_lxc, config) }

  ct_name      = each.key
  ....

}

Managing configuration complexity

With time and as my infrastructure grows more complex, I encountered yet another challenge: ensuring that Terraform operations would only affect the specific resources I intended to modify. In a large infrastructure setup, even a small change could potentially cascade into unintended modifications or deletions of other resources. This is where Terraform workspaces become extremly useful.

Think of workspaces as separate environments within your Terraform configuration, each maintaining its own independent state. Just as you might have different branches in Git to isolate different features, Terraform workspaces isolate different states of the managed infrastructure. This isolation provides a safety net, ensuring that changes in one workspace won't accidentally impact resources managed in another.

Besides using workspaces, I leverage this concept by organizing my infrastructure into logical groups based on their purpose using workspace dedicated var files. For instance, when working with Kubernetes, all related components - from the main cluster nodes to supporting services - are defined in a dedicated variable file named k8s.tfvars. To deploy the Kubernets setup, I first create a dedicated workspace:

terraform workspace new k8s

Then, I apply the configuration using the workspace-specific variables:

terraform plan --var-file=k8s.tfvars -out=plan
terraform apply plan

The k8s.tfvars looks something like this:

lxc= {
    k8s-controller: {
        cpu_cores: 2,
        hdd_size: "25G",
        memory: 2048,
        net: {
            device: "eth0",
            macaddr: "aa:00:d2:50:10:50",
            name: "vmbr1",
            tag: 5
            ip: "10.x.x.x/16",
            gateway: "10.x.x.x"
        },
        swap: 0,
        target: "pve",,
        tags: "k8s,controller"
    }
    k8s-worker: {
        cpu_cores: 4,
        hdd_size: "25G",
        memory: 4096,
        net: {
            device: "eth0",
            macaddr: "aa:00:d2:50:10:50",
            name: "vmbr1",
            tag: 5
            ip: "10.x.x.x/16",
            gateway: "10.x.x.x"
        },
        swap: 0,
        target: "pve",,
        tags: "k8s,worker"
    }
    k8s-worker: {
        cpu_cores: 4,
        hdd_size: "25G",
        memory: 4096,
        net: {
            device: "eth0",
            macaddr: "aa:00:d2:50:10:50",
            name: "vmbr1",
            tag: 5
            ip: "10.x.x.x/16",
            gateway: "10.x.x.x"
        },
        swap: 0,
        target: "pve",,
        tags: "k8s,worker"
    }
}

This structured approach offers several benefits:

it prevents accidental modifications to unrelated infrastructure
it provides clear documentation of which components belong together
it makes it easier to tear down and recreate specific parts of the infrastructure for testing
it maintains a clean separation of states, making troubleshooting more straightforward

This organization pattern is proving especially valuable when experimenting with different configurations or testing new technologies in my homelab environment. And together with defaults merging it simplifies the maintenance immensely.

Reflectin on the journey thus far

Looking back at my homelab journey, the evolution from clicking through wizards to managing infrastructure through code represents more than just a technical progression for me. What started as a simple need to automate repetitive tasks led me to advance my skill in concepts like infrastructure as code and modular design patterns – skills that prove invaluable not only in my homelab but also in professional environments.

Although I've used, since at least 2014, automation tools like Ansible, Chef or SaltStack, I really enjoy Terraform's expressive sintax and modularity. And with its ever growing number of providers it is quickly becoming a tool that I use every day.

The combination of Proxmox's flexibility and Terraform's automation capabilities, has transformed my homelab from a chaoting testing ground into a proper environment where I can quickly test ideas and designs without fear of affecting critical systems.

I hope sharing these experiences and the accompanying module helps you too in your journey toward infrastructure automation, whether managing a homelab or working on enterprise production systems.