Ubuntu 24.04 : Pacemaker : Add or Remove Nodes : Server World

Pacemaker : Add or Remove Nodes

2024/07/23

Add new nodes to an existing cluster.

As an example, add [node03] to the cluster like follows.

                       +--------------------+
                       | [  ISCSI Target  ] |
                       |    dlp.srv.world   |
                       +----------+---------+
                         10.0.0.30|
                                  |
+----------------------+          |          +----------------------+
| [  Cluster Node#1  ] |10.0.0.51 | 10.0.0.52| [  Cluster Node#2  ] |
|   node01.srv.world   +----------+----------+   node02.srv.world   |
+----------------------+          |          +----------------------+
                                  |
                                  |10.0.0.53
                      +-----------------------+
                      | [  Cluster Node#3  ]  |
                      +   node03.srv.world    |
                      +-----------------------+

[1]	Install Pacemaker on new Node, refer to [1] of here.
[2]	Add a new node to an existing cluster.

root@node01:~#

pcs status

Cluster name: ha_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: node01.srv.world (version 2.1.6-6fdc9deea29) - partition with quorum
  * Last updated: Tue Jul 23 04:38:34 2024 on node01.srv.world
  * Last change:  Tue Jul 23 04:38:19 2024 by root via cibadmin on node01.srv.world
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ node01.srv.world node02.srv.world ]

Full List of Resources:
  * scsi-shooter        (stonith:fence_scsi):    Started node01.srv.world
  * Resource Group: ha_group:
    * lvm_ha    (ocf:heartbeat:LVM-activate):    Started node01.srv.world

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

# authorize new node

root@node01:~#

pcs host auth node03.srv.world

Username: hacluster
Password:
node03.srv.world: Authorized

# add new node

root@node01:~#

pcs cluster node add node03.srv.world

No addresses specified for host 'node03.srv.world', using 'node03.srv.world'
Disabling sbd...
node03.srv.world: sbd disabled
Sending 'corosync authkey', 'pacemaker authkey' to 'node03.srv.world'
node03.srv.world: successful distribution of the file 'corosync authkey'
node03.srv.world: successful distribution of the file 'pacemaker authkey'
Sending updated corosync.conf to nodes...
node02.srv.world: Succeeded
node01.srv.world: Succeeded
node03.srv.world: Succeeded
node02.srv.world: Corosync configuration reloaded

[3]	Update setting of Fence Device. If SCSI fencing is configured for the fence device as in this example, log in to the shared storage for the fence device on the newly added node and install the SCSI fence agent ([2], [3] ). Then update the fencing device configuration as follows.

# update fencing device list

root@node01:~#

pcs stonith update scsi-shooter pcmk_host_list="node01.srv.world node02.srv.world node03.srv.world"

root@node01:~#

pcs stonith config scsi-shooter

Resource: scsi-shooter (class=stonith type=fence_scsi)
  Attributes: scsi-shooter-instance_attributes
    devices=/dev/disk/by-id/wwn-0x6001405fd08aa7cd2fe4f8cad7b28412
    pcmk_host_list="node01.srv.world node02.srv.world node03.srv.world"
  Meta Attributes: scsi-shooter-meta_attributes
    provides=unfencing
  Operations:
    monitor: scsi-shooter-monitor-interval-60s
      interval=60s

[4]

If you have already configured resources in your existing cluster, you need configure them for each resource so that the newly added node can successfully become active in the event of a failover.
For example, if you have configured LVM shared storage as shown here, you will need to make the newly added node aware of its LVM shared storage beforehand.

root@node03:~#

iscsiadm -m discovery -t sendtargets -p 10.0.0.30

10.0.0.30:3260,1 iqn.2022-01.world.srv:dlp.target01
10.0.0.30:3260,1 iqn.2022-01.world.srv:dlp.target02

root@node03:~#

iscsiadm -m node --login --target iqn.2022-01.world.srv:dlp.target02

root@node03:~#

iscsiadm -m session -o show

tcp: [1] 10.0.0.30:3260,1 iqn.2022-01.world.srv:dlp.target01 (non-flash)
tcp: [2] 10.0.0.30:3260,1 iqn.2022-01.world.srv:dlp.target02 (non-flash)

root@node03:~#

lvm pvscan --cache --activate ay

  pvscan[4061] PV /dev/vda3 online, VG ubuntu-vg is complete.
  pvscan[4061] PV /dev/sdb1 ignore foreign VG.
  pvscan[4061] VG ubuntu-vg run autoactivation.
  1 logical volume(s) in volume group "ubuntu-vg" now active

[5]	If you have already configured resources in your existing cluster, you need configure them for each resource so that the newly added node can successfully become active in the event of a failover. For example, if you are configuring Apache httpd as shown here, you will need to configure [1] section in the link target on the newly added node.
[6]	After completing all settings for each resource, start the cluster service on the newly added node.

# start cluster services

root@node01:~#

pcs cluster start node03.srv.world

node03.srv.world: Starting Cluster...
root@node01:~#

pcs cluster enable node03.srv.world

node03.srv.world: Cluster Enabled

root@node01:~#

pcs status

Cluster name: ha_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: node01.srv.world (version 2.1.6-6fdc9deea29) - partition with quorum
  * Last updated: Tue Jul 23 04:49:56 2024 on node01.srv.world
  * Last change:  Tue Jul 23 04:49:47 2024 by hacluster via crmd on node01.srv.world
  * 3 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ node01.srv.world node02.srv.world node03.srv.world ]

Full List of Resources:
  * scsi-shooter        (stonith:fence_scsi):    Started node01.srv.world
  * Resource Group: ha_group:
    * lvm_ha    (ocf:heartbeat:LVM-activate):    Started node01.srv.world

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

[7]	Run fencing and verify that it successfully fails over to the newly added node.

root@node03:~#

pcs stonith fence node01.srv.world

Node: node01.srv.world fenced

root@node03:~#

pcs status

Cluster name: ha_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: node02.srv.world (version 2.1.6-6fdc9deea29) - partition with quorum
  * Last updated: Tue Jul 23 04:51:01 2024 on node03.srv.world
  * Last change:  Tue Jul 23 04:49:47 2024 by hacluster via crmd on node01.srv.world
  * 3 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ node02.srv.world node03.srv.world ]
  * OFFLINE: [ node01.srv.world ]

Full List of Resources:
  * scsi-shooter        (stonith:fence_scsi):    Started node02.srv.world
  * Resource Group: ha_group:
    * lvm_ha    (ocf:heartbeat:LVM-activate):    Started node02.srv.world

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

[8]	To delete a node, run like follows.

root@node01:~#

pcs status

Cluster name: ha_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: node02.srv.world (version 2.1.6-6fdc9deea29) - partition with quorum
  * Last updated: Tue Jul 23 04:56:45 2024 on node01.srv.world
  * Last change:  Tue Jul 23 04:49:47 2024 by hacluster via crmd on node01.srv.world
  * 3 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ node01.srv.world node02.srv.world node03.srv.world ]

Full List of Resources:
  * scsi-shooter        (stonith:fence_scsi):    Started node02.srv.world
  * Resource Group: ha_group:
    * lvm_ha    (ocf:heartbeat:LVM-activate):    Started node02.srv.world

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

root@node01:~#

pcs cluster node remove node03.srv.world

Destroying cluster on hosts: 'node03.srv.world'...
node03.srv.world: Successfully destroyed cluster
Sending updated corosync.conf to nodes...
node02.srv.world: Succeeded
node01.srv.world: Succeeded
node01.srv.world: Corosync configuration reloaded

# update fencing device list

root@node01:~#

pcs stonith update scsi-shooter pcmk_host_list="node01.srv.world node02.srv.world"

root@node01:~#

pcs status

Cluster name: ha_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: node02.srv.world (version 2.1.6-6fdc9deea29) - partition with quorum
  * Last updated: Tue Jul 23 04:59:46 2024 on node01.srv.world
  * Last change:  Tue Jul 23 04:59:38 2024 by hacluster via crmd on node02.srv.world
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ node01.srv.world node02.srv.world ]

Full List of Resources:
  * scsi-shooter        (stonith:fence_scsi):    Started node02.srv.world
  * Resource Group: ha_group:
    * lvm_ha    (ocf:heartbeat:LVM-activate):    Started node02.srv.world

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Matched Content