If disk fails, zpool will be in the state: DEGRADED, on the primary server.
~# zpool status pool: NETSTOR state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: scrub repaired 0 in 0h0m with 0 errors on Tue Dec 6 15:10:59 2016 config: NAME STATE READ WRITE CKSUM NETSTOR DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 SW3-NETSTOR-SRV1-1 ONLINE 0 0 0 SW3-NETSTOR-SRV2-1 FAULTED 3 0 0 too many errors errors: No known data errors
First, we have to make sure our damaged disk is on a secondary server not primary
in this case, we can find this out from the output above:
SW3-NETSTOR-SRV2-1 FAULTED
SRV2 this means Server-2 has a damaged disk.
If this is the case, we can proceed to the next step.
If the damaged disk is on the primary server SRV1, then we should first make a manual takeover and switch it to the secondary server. To switch manually, ssh to the secondary server and execute the next command:
killall -SIGUSR1 sysmonit
Next, we should physically replace the damaged disk in the server.
In the output from the zpool status we can see that SW3-NETSTOR-SRV2-1 is corrupted:
SW3-NETSTOR-SRV2-1 FAULTED 3 0 0 too many errors
If this is the case we need to replace the disk labeled SW3-NETSTOR-SRV2-1 with a new one and add it to the zpool mirror.
First, physically remove the faulty disk from the server and replace it with a new disk.
After replacement, we should see a new disk in /dev/disk/by-id/
# ls -lah /dev/disk/by-id total 0 drwxr-xr-x 2 root root 480 Srp 27 08:57 . drwxr-xr-x 7 root root 140 Srp 27 08:13 .. lrwxrwxrwx 1 root root 9 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN -> ../../sde lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN-part1 -> ../../sde1 lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN-part2 -> ../../sde2 lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-INTEL_SSDSC2CW060A3_CVCV308402M3060AGN-part9 -> ../../sde9 lrwxrwxrwx 1 root root 9 Srp 27 08:13 ata-ST31000520AS_5VX0BZN0 -> ../../sda lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-ST31000520AS_5VX0BZN0-part1 -> ../../sda1 lrwxrwxrwx 1 root root 9 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX61A465TH1Y -> ../../sdc lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX61A465TH1Y-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 9 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX81EC512Y4H -> ../../sdd lrwxrwxrwx 1 root root 10 Srp 27 08:13 ata-WDC_WD10JFCX-68N6GN0_WD-WX81EC512Y4H-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 9 Srp 27 08:57 ata-WDC_WD10JFCX-68N6GN0_WD-WXK1E6458WKX -> ../../sdb
lrwxrwxrwx 1 root root 9 Srp 27 08:13 wwn-0x10076999618641940481x -> ../../sdd lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x10076999618641940481x-part1 -> ../../sdd1 lrwxrwxrwx 1 root root 9 Srp 27 08:13 wwn-0x11689569317835657217x -> ../../sdc lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x11689569317835657217x-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 9 Srp 27 08:57 wwn-0x11769037186453098497x -> ../../sdb lrwxrwxrwx 1 root root 9 Srp 27 08:13 wwn-0x12757853320186451405x -> ../../sde lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x12757853320186451405x-part1 -> ../../sde1 lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x12757853320186451405x-part2 -> ../../sde2 lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x12757853320186451405x-part9 -> ../../sde9 lrwxrwxrwx 1 root root 9 Srp 27 08:13 wwn-0x7847552951345238016x -> ../../sda lrwxrwxrwx 1 root root 10 Srp 27 08:13 wwn-0x7847552951345238016x-part1 -> ../../sda1
Now when we have a block device name, we can make a table, partition, and prepare the drive for usage.
To make a partition table use parted:
~# parted /dev/ --script -- mktable gpt
Create a new label.
IMPORTANT: label must be named in the following format: SW3-NETSTOR-SRVx-y.
Where “SRVx” comes from the server number and “-y” is the disk number.
Now add a label to the new drive.
Create the partition with the name to match our faulty partition on the server. We have this name from the above:
SW3-NETSTOR-SRV2-1 FAULTED 3 0 0 too many errors
Our command in this case will be:
~# parted /dev/ --script -- mkpart "SW3-NETSTOR-SRV2-1" 1 -1
We have now added a new partition and created a label.
To replace the drive we can use sw-nvmecommands which are listed below:
Command | Description | |
---|---|---|
sw-nvme list | Lists all connected devices with /dev/nvme-fabrics | |
sw-nvme discover | Discover will display all devices exported on the remote host with given IP and port | |
sw-nvme connect | Import remote device from given IP, port and nqn | |
sw-nvme disconnect | Remove the imported device from the host | |
sw-nvme disconnect-all | Remove all imported devices from the host | |
sw-nvme import | For given file in proper JSON format import remote devices | |
sw-nvme reload-import | For given file in proper JSON format import remote devices after disconnecting all current imports | |
sw-nvme enable-modules | Command will enable necessary kernel modules for NVMe/TCP | |
sw-nvme enable-namespace | Enable namespace with given id | |
sw-nvme disable-namespace | Disable namespace with the given id | |
sw-nvme load | For given file in proper JSON format export remote devices | |
sw-nvme store | If devices are exported manually, store will save system configuration in proper JSON format | |
sw-nvme clear | Command will remove exported device from system configuration. If specified with 'all' it will remove all configurations | |
sw-nvme export | For given URL parameter export device on port with nqn | |
sw-nvme export-stop | Remove device being exported on port with id | |
sw-nvme reload-configuration | For given file in proper JSON format export remote devices, after removing all current exports | |
sw-nvme replace-disk | This command combines 'clear all' and reload-configuration for easier disk replacement procedure on SERVERware | |
sw-nvme expand-pool | Command updates export configuration and adds new namespace into sw-mirror subsystem for SERVERware |
Now we need to replace old drive with new drive using command:
~# sw-nvme replace-disk --old /dev/disk/by-id/old_disk_id --new /dev/disk/by-id/new_disk_id
To find the old disk id in CMD enter sw-nvme show command.
Example:
~#sw-nvme show { "config": "/sys/kernel/config/nvmet", "hosts": [ "3cc5c2aa47825e608570a938971bcd7c" ], "subsystems": { "sw-mirror": { "acl": [ "3cc5c2aa47825e608570a938971bcd7c" ], "namespaces": [ { "id": 1,
"device": "/dev/disk/by-id/ata-KINGSTON_SA400S37120G_50026B73804B902A",
"enabled": true}
],
"allow_any_host": false
}
},
"ports": {
"1": {
"address": "1.1.1.31",
"port": 4420,
"address_family": "ipv4",
"trtype": "tcp",
"subsystems": "sw-mirror"
}
}
}
Now when we have old disk id and new disk id our command for disk replace will be:
~#sw-nvme replace-disk --old /dev/disk/by-id/ata-KINGSTON_SA400S37120G_50026B73804B902A --new /dev/disk/by-id/ata-WDC_WD10JFCX-68N6GN0_WD-WXK1E6458WKX
This ends our procedure on the secondary server.
Next, on the primary server, add a newly created virtual disk to the zfs pool.
Next we need to execute :
~# partprobe
We can see zpool status:
~# zpool status pool: NETSTOR state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: scrub repaired 0 in 0h0m with 0 errors on Tue Dec 6 15:10:59 2016 config: NAME STATE READ WRITE CKSUM NETSTOR DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 SW3-NETSTOR-SRV1-1 ONLINE 0 0 0 SW3-NETSTOR-SRV2-1 FAULTED 3 0 0 too many errors errors: No known data errors
From the output we can see:
SW3-NETSTOR-SRV2-1 FAULTED status of secondary disk.
Now we need to change guid of the old disk to guid of the new disk, so that zpool can identify the new disk.
To change guid from old to new in zpool, first, we need to find out new guid.
We can use zdb command to find out:
~# zdb NETSTOR: version: 5000 name: 'NETSTOR' state: 0 txg: 15 pool_guid: 14112818788567273316 errata: 0 hostname: 'HydraA-1' vdev_children: 1 vdev_tree: type: 'root' id: 0 guid: 14112818788567273316 children[0]: type: 'mirror' id: 0 guid: 17350955661294397060 metaslab_array: 34 metaslab_shift: 33 ashift: 12 asize: 1000164294656 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 11541101181530606692 path: '/dev/disk/by-partlabel/SW3-NETSTOR-SRV1-1' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1
guid: 12365645279327980714 path: '/dev/disk/by-partlabel/SW3-NETSTOR-SRV2-1'
whole_disk: 1 create_txg: 4 features_for_read: com.delphix:hole_birth com.delphix:embedded_data
The important line for from zdb output:
guid: 12365645279327980714 path: '/dev/disk/by-partlabel/SW3-NETSTOR-SRV2-1'
The guid part needs to be updated to zpool.
We can update guid with the command:
~# zpool replace NETSTOR -f
Example:
~# zpool replace NETSTOR 12365645279327980714 /dev/disk/by-partlabel/SW3-NETSTOR-SRV2-1 -f
Now check zpool status:
~# zpool status pool: NETSTOR state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Dec 6 16:12:53 2016 591M scanned out of 728M at 65,6M/s, 0h0m to go 590M resilvered, 81,14% done config: NAME STATE READ WRITE CKSUM NETSTOR DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 SW3-NETSTOR-SRV1-1 ONLINE 0 0 0 replacing-1 UNAVAIL 0 0 0 old UNAVAIL 0 0 0 corrupted data SW3-NETSTOR-SRV2-1 ONLINE 0 0 0 (resilvering)errors: No known data errors
You need to wait for zpool to finish resilvering.
This ends our replacement procedure.