5.20.2.3 Recovering a Management Domain and Its User Domains (Release 18.1 and X7 and Later)
You can recover a management domain from a snapshot-based backup when severe disaster conditions damage the management domain, or when the server hardware is replaced to such an extent that it amounts to new hardware.
- Prepare an NFS server to host the backup archive
mybackup.tar.bz2
.The NFS server must be accessible by IP address. For example, on an NFS server with the IP address nfs_ip, where the directory
/export
is exported from NFS mounts, put themybackup.tar.bz2
file in the/export
directory - Restart the recovery target system using the
diagnostics.iso
file.See Booting a Server using the Diagnostic ISO File in Oracle Exadata System Software User's Guide. - Log in to the diagnostics shell as the
root
user.When prompted, enter the diagnostics shell.For example:
Choose from following by typing letter in '()': (e)nter interactive diagnostics shell. Must use credentials from Oracle support to login (reboot or power cycle to exit the shell), (r)estore system from NFS backup archive, Type e to enter the diagnostics shell and log in as the root user.
If prompted, log in to the system as theroot
user. If you are prompted for theroot
user password and do not have it, then contact Oracle Support Services. - If required, use
/opt/MegaRaid/storcli/storcli64
(or/opt/MegaRAID/MegaCli/MegaCli64
for releases earlier than Oracle Exadata System Software 19c) to configure the disk controller to set up the disks. - Remove the logical volumes, the volume group, and the physical volume, in case they still exist after the disaster.
# lvm vgremove VGExaDb --force # lvm pvremove /dev/sda3 --force
- Remove the existing partitions, then verify all partitions were removed.
# parted GNU Parted 2.1 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) print Model: AVAGO MR9361-16i (scsi) Disk /dev/sda: 4193GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 32.8kB 537MB 537MB ext4 primary boot 2 537MB 805MB 268MB fat32 primary boot 3 805MB 4193GB 4192GB primary lvm (parted) rm 1 [ 1730.498593] sda: sda2 sda3 (parted) rm 2 [ 1736.203794] sda: sda3 (parted) rm 3 [ 1738.546845] sda: (parted) print Model: AVAGO MR9361-16i (scsi) Disk /dev/sda: 4193GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags (parted) q Information: You may need to update /etc/fstab.
- Create the three partitions on
/dev/sda
.- Get the end sector for the disk
/dev/sda
from a running management domain (dom0) and store it in a variable:# end_sector_logical=$(parted -s /dev/sda unit s print|perl -ne '/^Disk\s+\S+:\s+(\d+)s/ and print $1') # end_sector=$( expr $end_sector_logical - 34 ) # echo $end_sector
The values for the start and end sectors in the commands below were taken from an existing management domain. Because these values can change over time, it is recommended that these values are checked from an existing dom0. For example, for an Oracle Exadata X7-2 database server with 8 hard disk drives, you might see the following:
# parted -s /dev/sda unit s print Model: AVAGO MR9361-16i (scsi) Disk /dev/sda: 8189440000s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 64s 1048639s 1048576s ext4 primary boot 2 1048640s 1572927s 524288s fat32 primary boot 3 1572928s 8189439966s 8187867039s primary lvm
Note:
Thes
(sector) value for the following sub-steps are based on a system with 8 hard disk drives. If you have 4 hard disk drives, then you need to view the partition table from the management domain on a running node and adjust the sector values accordingly. - Create the boot partition,
/dev/sda1
.# parted -s /dev/sda mklabel gpt mkpart primary 64s 1048639s set 1 boot on
- Create the partition that will hold the LVMs,
/dev/sda2
.# parted -s /dev/sda mkpart primary fat32 1048640s 1572927s set 2 boot on
- Create the partition that will hold the LVMs,
/dev/sda3
.# parted -s /dev/sda mkpart primary 1572928s 8189439966s set 3 lvm on
- Get the end sector for the disk
- Use the
/sbin/lvm
command to re-create the logical volumes andmkfs
to create the file systems.- Create the physical volume and the volume group.
# lvm pvcreate /dev/sda3 # lvm vgcreate VGExaDb /dev/sda3
- Create the logical volume for the file system that will contain the
/
(root) directory and label it.# lvm lvcreate -n LVDbSys3 -L30G VGExaDb # mkfs -t ext4 /dev/VGExaDb/LVDbSys3 # e2label /dev/VGExaDb/LVDbSys3 DBSYSOVS
- Create the logical volume for the swap directory, and label it.
# lvm lvcreate -n LVDbSwap1 -L24G VGExaDb # mkswap -L SWAP /dev/VGExaDb/LVDbSwap1
- Create the logical volume for the backup partition, and build a file system on top of it.
# lvm lvcreate -n LVDbSys2 -L30G VGExaDb # mkfs -t ext4 /dev/VGExaDb/LVDbSys2
- Create the logical volume for the reserved partition, which is needed for creating snapshots.
# lvm lvcreate -n LVDoNotRemoveOrUse –L 1G VGExaDb
Note:
Do not create any file system on this logical volume. - Create the logical volume for the guest storage repository.
# lvm lvcreate -l 100%FREE -n LVDbExaVMImages VGExaDb
- Create a file system on the
/dev/sda1
partition, and label it.# mkfs.ext4 /dev/sda1 # e2label /dev/sda1 BOOT # tune2fs -l /dev/sda1
- Create a file system on the
/dev/sda2
partition, and label it.# mkfs.vfat -v -c -F 32 -s 2 /dev/sda2 # dosfslabel /dev/sda2 ESP
- Create the physical volume and the volume group.
- Create mount points for all the partitions, and mount the respective partitions.
For example, if
/mnt
is used as the top-level directory, the mounted list of partitions might look like:/dev/VGExaDb/LVDbSys3
on/mnt
/dev/sda1
on/mnt/boot
/dev/sda2
on/mnt/boot/efi
The following example mounts the root (/) file system, and creates three mount points:
# mount /dev/VGExaDb/LVDbSys3 /mnt -t ext4 # mkdir /mnt/boot # mount /dev/sda1 /mnt/boot -t ext4 # mkdir /mnt/boot/efi # mount /dev/sda2 /mnt/boot/efi -t vfat
- Bring up the network on
eth0
and (if not using DHCP) assign the host's IP address and netmask to it.If you are using DHCP then you do not have to manually configure the IP address for the host.
# ip address add ip_address_for_eth0/netmask_for_eth0 dev eth0 # ip link set up eth0 # ip route add default via gateway_ip_address dev eth0
- Mount the NFS server holding the backups.
# mkdir -p /root/mnt # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
- From the backup which was created in Backing up the Management Domain dom0 Using Snapshot-Based Backup, restore the root directory (
/
) and the boot file system.# tar -pjxvf /root/mnt/backup-of-root-and-boot.tar -C /mnt
- Use the
efibootmgr
command to set the boot device.- Disable and delete the
Oracle Linux
boot device. If you see the entryExadataLinux_1
, then remove this entry and recreate it.For example:
# efibootmgr BootCurrent: 000F Timeout: 1 seconds BootOrder: 000F,0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000D,000E Boot0000* ExadataLinux_1 Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit Network Connection Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot000D* Oracle Linux Boot000E* UEFI OS Boot000F* USB:SUN
In this example, you would disable and remove
Oracle Linux (Boot00D)
andExadataLinux_1 (Boot000)
. Use commands similar to the following to disable and delete the boot devices:Disable 'Oracle Linux':
# efibootmgr -b 000D -A
Delete 'Oracle Linux':
# efibootmgr -b 000D -B
Disable old 'ExadataLinux_1':
# efibootmgr -b 0000 -A
Delete old 'ExadataLinux_1':
# efibootmgr -b 0000 -B
- Recreate the boot entry for
ExadataLinux_1
and then view the boot order entries.# efibootmgr -c -d /dev/sda -p 2 -l '\EFI\XEN\XEN.EFI' -L 'ExadataLinux_1' # efibootmgr BootCurrent: 000F Timeout: 1 seconds BootOrder: 0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit Network Connection Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot000E* UEFI OS Boot000F* USB:SUN Boot0000* ExadataLinux_1
In the output from the
efibootmgr
command, make note of the boot order number forExadataLinux_1
and use that value in the following commands:# efibootmgr -b (entry number) -A # efibootmgr -b (entry number) -a
For example, in the previous output shown in step 13.a,
ExadataLinux_1
was listed as(Boot000)
. So you would use the following commands:# efibootmgr -b 0000 -A # efibootmgr -b 0000 -a
- Set the correct boot order.Set
ExadataLinux_1
as the first boot device. The remaining devices should stay in the same boot order, except forUSB:SUN
, which should be last.# efibootmgr -o 0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F
The boot order should now look like the following:
# efibootmgr BootCurrent: 000F Timeout: 1 seconds BootOrder: 0000,0001,0002,0003,0004,0005,0007,0008,0009,000A,000E,000F Boot0000* ExadataLinux_1 Boot0001* NET0:PXE IP4 Intel(R) I210 Gigabit Network Connection Boot0002* NET1:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller Boot0003* NET2:PXE IP4 Oracle Dual Port 10GBase-T Ethernet Controller Boot0004* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter Boot0005* PCIE1:PXE IP4 Oracle Dual Port 25Gb Ethernet Adapter Boot0007* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot0008* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot0009* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot000A* PCIE3:PXE IP4 Oracle Quad Port 10GBase-T Adapter Boot000E* UEFI OS Boot000F* USB:SUN
- Check the boot order using the
ubiosconfig
command.# ubiosconfig export all -x /tmp/ubiosconfig.xml
Make sure theExadataLinux_1
entry is the first child element ofboot_order
.<boot_order> <boot_device> <description>ExadataLinux_1</description> <instance>1</instance> </boot_device> <boot_device> <description>NET0:PXE IP4 Intel(R) I210 Gigabit Network Connection</description> <instance>1</instance> </boot_device> ...
- Disable and delete the
- Check the restored
/etc/fstab
file and comment out any reference to/EXAVMIMAGES
.# cd /mnt/etc
Comment out any line that references
/EXAVMIMAGES
. - Detach the
diagnostics.iso
file.Using the ILOM Web interface, navigate to the Storage Devices dialog and click Disconnect.
The Storage Devices dialog is the interface that you earlier used to attach the
diagnostics.iso
image. See Booting a Server using the Diagnostic ISO File in Oracle Exadata System Software User's Guide. - Unmount the restored
/dev/sda1
partitions so/dev/sda1
can be remounted on/boot
.# umount /mnt/boot/efi # umount /mnt/boot # umount /mnt # umount /root/mnt
- Restart the system.
# shutdown -r now
This completes the restoration procedure for the management domain (dom0).
- Convert to Eighth Rack, if required.
If the recovery is on an Oracle Exadata Eighth Rack, then perform the procedure described in Configuring Oracle Exadata Database Machine Eighth Rack Oracle Linux Database Server After Recovery.
- When the server comes back up, build an OCFS2 file system on the
LVDbExaVMImages
logical volume.# mkfs -t ocfs2 -L ocfs2 -T vmstore --fs-features=local /dev/VGExaDb/LVDbExaVMImages --force
- Mount the OCFS2 partition on
/EXAVMIMAGES
.# mount -t ocfs2 /dev/VGExaDb/LVDbExaVMImages /EXAVMIMAGES
- In
/etc/fstab
, uncomment the references to/EXAVMIMAGES
and/dev/mapper/VGExaDb-LVDbExaVMImages
, which you commented out earlier. - Mount the backup NFS server that holds the storage repository (
/EXAVMIMAGES
) backup to restore the/EXAVMIMAGES
file system.# mkdir -p /root/mnt # mount -t nfs -o ro,intr,soft,proto=tcp,nolock nfs_ip:/location_of_backup /root/mnt
- Restore the
/EXAVMIMAGES
file system.To restore all user domains, use this command:
# tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES
To restore a single user domain from the backup, use the following command instead:
# tar -Spxvf /root/mnt/backup-of-exavmimages.tar -C /EXAVMIMAGES EXAVMIMAGES/<user-domain-name-to-be-restored>
- Bring up each user domain.
# xm create /EXAVMIMAGES/GuestImages/user_domain_hostname/vm.cfg
At this point all the user domains should come up along with Oracle Grid Infrastructure and the Oracle Database instances.