I recently created a Proxmox cluster for some physical machine … for making it short, it sometime more efficient to run its own machine that going to cloud, in particular when you can manage a lower service level. So I use a mix of solutions, some on VM running on cloud provider solution like for critical production (Helium services as an example) and I have some bare-metal machine for high computation service (like Helium ETL) where cloud provider would invoice $3000 a month and for the rest I decided to run a Proxmox cluster to have an intermediate low cost infra with a minimum of redundancy I can master.
I started with 2 nodes with the ability to move workload from one to the other and as recently i got a corruption on one of my ZFS storage loosing one of the NVME, I’ve been happy of this setup. Unfortunately, when restarting the damaged node after being repaired, the second one restarted all the VMs causing an unexpected service failure. Let’s see how to setup a Proxmox cluster for not getting into this.
Continue reading