Proxmox cluster setup

I recently created a Proxmox cluster for some physical machine … for making it short, it sometime more efficient to run its own machine that going to cloud, in particular when you can manage a lower service level. So I use a mix of solutions, some on VM running on cloud provider solution like for critical production (Helium services as an example) and I have some bare-metal machine for high computation service (like Helium ETL) where cloud provider would invoice $3000 a month and for the rest I decided to run a Proxmox cluster to have an intermediate low cost infra with a minimum of redundancy I can master.

I started with 2 nodes with the ability to move workload from one to the other and as recently i got a corruption on one of my ZFS storage loosing one of the NVME, I’ve been happy of this setup. Unfortunately, when restarting the damaged node after being repaired, the second one restarted all the VMs causing an unexpected service failure. Let’s see how to setup a Proxmox cluster for not getting into this.

When creating a cluster, you need at least 3 members as the quorum is at least 2. This is the reason why my VM restarted when one on the nodes went down. That’s the usual problem in clusters 😉 So or you need to have 3 servers, or you can just deploy a QDevice (kind of arbiter) on a 3rd server to have 3 voting members and a valid quorum of 2.

I will not go on the creation of the cluster itself as it is mainly through the UI and documented in plenty of existing blog post like this one.

So this is more about the second step of configuration to add a 3rd node as a QDevice. This third node doesn’t need to be dedicated, it can be another server anywhere, eventually a raspberry PI at home … the only requirement is to have this machine accessible with SSH from Internet.

On this third machine, you need to install the qdevice service:

root@node:~# apt install corosync-qnetd corosync-qdevice

Later the proxmox server will connect to this third machine over ssh for the setup, so to make it able to connect seamlessly, copy and paste the .ssh/id_rsa.pub key from one of the proxmox cluster node to the .ssh/authorized_keys file of the third server. Get the IP address of your third server.

Now you can go on one of the node of your Proxmox cluster. Test the connection to the 3rd server with

root@node:~# ssh root@<3rd server ip>
root@3rdServer:~# 

Accept the fingerprint when asked and if that worked you can exit and deploy the qdevice setup on the cluster nodes. So on all the cluster nodes install the software stack:

root@node:~# apt install corosync-qdevice

Then from one of the nodes:

root@node:~# pvecm qdevice setup <3rd sever ip> -f

The command run different setup steps into the cluster and at the end of it you may have the qdevice deployed as part of the voters, you can check this with the following command:

root@node:~# pvecm status
...
Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate Qdevice 
Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1    A,V,NMW xxxx (local)
0x00000002          1    A,V,NMW xxxx
0x00000000          1            Qdevice

Qnetd uses port 5403/TCP, make sure it’s accessible, if not you will see the Qdevice not voting. Have a look at the Qdevice flags:

A - means Available (up) would be NA if down
V - means Voting, if you see NV you may have a port opening issue

More documentation can be found here

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.