5.24.4 Configuring InfiniBand Partitioning across Oracle VM RAC Clusters
The steps for configuring InfiniBand Partitioning across Oracle RAC clusters running in Oracle VM are described here.
Before you start this task, download and extract the file
create_pkeys.tar
. This file can be downloaded from Implementing InfiniBand Partitioning across OVM RAC clusters on Exadata
(My Oracle Support Doc ID 2075398.1). The file should be downloaded to one of the
management domain (dom0) nodes. This is the node that you will use for running all the
scripts in this procedure. This node will be referred to as driver_dom0 in this procedure.
When you extract the file, you should get three files:
create_pkeys_on_switch.sh
run_create_pkeys.sh
create_pkey_files.sh
- Allocate IP addresses to be used by the pkey interfaces.
Plan and allocate sets of IP addresses and netmasks for each Oracle VM RAC cluster that will be used by the cluster pkey interfaces and the storage pkey interfaces when InfiniBand partitioning gets implemented in the cluster.
Refer to the topic About InfiniBand Partitioning Network Configuration for an example.
- On the InfiniBand switches, create a dedicated partition (cluster pkey) for each Oracle RAC cluster to be used by the clusterware and create one partition (storage pkey) to be used by all the Oracle VM RAC clusters and the storage cells for communication between the Oracle RAC cluster nodes and the storage cells.
You assign a pkey to each partition as a simplified means of identifying the partition to the Subnet Manager. Pkeys are 15-bit integers. Values
0x0001
and0x7fff
are default partitions. Use values between0x0002
and0x7ffe
for your pkeys.- Enable password-less ssh equivalence for the
root
user from the driver_dom0 management domain (dom0) node to all the switches on the InfiniBand fabric.Use a command similar to the following where ib_switch_list refers to a file that contains the list of all the InfiniBand switches on the fabric, with each switch name on a separate line.
# dcli –g ib_switch_list -l root –k
- Run the script
create_pkeys_on_switch.sh
from driver_dom0 to create and configure the partition keys on the InfiniBand switches.Note:
Each run of the scriptcreate_pkeys_on_switch.sh
creates exactly one partition. You must run the script once for each partition to be created. For example, an environment that contains two Oracle VM RAC clusters will have a total of three partitions: one storage partition and two cluster partitions (one per Oracle RAC cluster). In this example, you will need to runcreate_pkeys_on_switch.sh
three times.You must run the script on only one node (driver_dom0). The script creates the partitions in all the switches provided as input.
- After you finish running the script, verify the partitions were created on all the switches.
# /usr/local/sbin/smpartition list active no-page
The following example output shows the default partitions (0x0001 and 0x7fff), and an additional partition, 0x0004. The partition with pkey 0x0004 is configured for IPoIB and has two member ports that are assigned full membership of the partition.
# Sun DCS IB partition config file #! version_number : 1 #! version_number : 12 Default=0x7fff, ipoib : ALL_CAS=full, ALL_SWITCHES=full, SELF=full; SUN_DCS=0x0001, ipoib : ALL_SWITCHES=full; = 0x0004,ipoib: 0x0021280001cf3787=full, 0x0021280001cf205b=full;
At this stage ensure that you have created all the required partitions.
- Enable password-less ssh equivalence for the
- On the Oracle VM RAC nodes and on the storage cells, generate all the relevant network configuration files for the new IP over InfiniBand (IPoIB) interfaces.
Each partition requires a new IPoIB network interface.
This step makes the following changes on the Oracle RAC cluster nodes:
-
Modifies these files:
/etc/sysconfig/network-scripts/ifcfg-ib0
/etc/sysconfig/network-scripts/ifcfg-ib1
-
Removes these files:
/etc/sysconfig/network-scripts/rule-ib0
/etc/sysconfig/network-scripts/rule-ib1
/etc/sysconfig/network-scripts/route-ib0
/etc/sysconfig/network-scripts/route-ib1
-
Creates the following new files in
/etc/sysconfig/network-scripts
:ifcfg-clib0, ifcfg-clib1
rule-clib0, rule-clib1
route-clib0, route-clib1
ifcfg-stib0, ifcfg-stib1
rule-stib0, rule-stib1
route-stib0, route-stib1
Note:
If this step fails, before you rerun this step:
- Restore all the files from
/etc/sysconfig/network-scripts/backup-for-pkeys
to/etc/sysconfig/network-scripts
. - Remove the newly created files listed in this step.
- Make sure passwordless ssh is set up from the driver_dom0 node to all the Oracle RAC cluster nodes and the storage cells that need to be configured for partition keys.
- Make sure
run_create_pkeys.sh
andcreate_pkey_files.sh
are executable and they are in the same directory on driver_dom0. - Run
run_create_pkeys.sh
.For cluster nodes, you need to run the script a total of four times for every cluster node with a node_type value of
compute
.The syntax for this script is:
run_create_pkeys.sh node_name interface_name pkey_id node_type pkey_ipaddr pkey_netmask pkey_interfaceType
node_name
specifies the cluster node.interface_name
is eitherib0
orib1
.pkey_id
specifies the pkey without the0x
prefix. The value used here is the cluster partition key derived from the cluster pkey_id value entered in step 2.node_type
is eithercompute
orcell
.pkey_ipaddr
specifies the IP address.pkey_netmask
specifies the netmask in CIDR format, for example,/21
.pkey_interfaceType
iscluster
orstorage
for compute node types, orstorage
for cell node types.
Note:
Thepkey_ipaddr
andpkey_netmask
of the cluster pkey interface must be on a different subnet from thepkey_ipaddr
andpkey_netmask
of the storage pkey interface.You can use the following command to derive the partition key values to be used for the
run_create_pkeys.sh
script from thepkey_id
value entered in step 2.FinalHexValue=$(echo "obase=16;ibase=2;$(expr 1000000000000000 + $(echo "obase=2;ibase=16;$(echo $HexValue|tr [:lower:] [:upper:])"|bc))" |bc|tr [:upper:] [:lower:])
FinalHexValue
is the value that will be entered in the command here andHexValue
is the value entered in step 2 forpkey_id
.The following table provides an example of the inputs for the four runs for a cluster node:
Table 5-4 Four Runs for Cluster Nodes
Run Interface Name pkey_id node_type pkey_ipaddress pkey_netmask pkey_interfaceType 1
ib0
a000
compute
192.168.12.153
/21
cluster
2
ib1
a000
compute
192.168.12.154
/21
cluster
3
ib0
aa00
compute
192.168.114.15
/20
storage
4
ib1
aa00
compute
192.168.114.16
/20
storage
You use these values in each run of the script, denoted by the Run column, as shown in this example, where
vm-guest-1
is the name of the cluster node.# ./run_create_pkeys.sh vm-guest-1 ib0 a000 compute 192.168.12.153 /21 cluster
At this stage all the required networking files listed at the beginning of this step have been created for the new pkey-enabled network interfaces on the Oracle VM RAC cluster nodes.
Oracle Grid Infrastructure has also been modified to make use of the new network interfaces upon restart. The output of the command
$GRID_HOME/bin/oifcfg getif
should listclib0
andclib1
in the list of interfaces to be used for the cluster interconnect. -
- Modify Oracle ASM and Oracle RAC
CLUSTER_INTERCONNECTS
parameter.- Log in to each of the Oracle ASM instances in the Oracle RAC cluster using SQL*Plus as SYS, and run the following command:
ALTER SYSTEM SET cluster_interconnects='<cluster_pkey_IP_address_of_ib0>: <cluster_pkey_IP_address_of_ib1>' scope=spfile sid='<name_of_current_ASM_instance>';
For example:
ALTER SYSTEM SET cluster_interconnects='192.168.12.153:192.168.12.154' scope=spfile sid='+ASM1';
- Log in to each of the database instances in the Oracle RAC cluster using SQL*Plus, and run the same command for the Oracle RAC instance:
For example:
ALTER SYSTEM SET cluster_interconnects='192.168.12.153:192.168.12.154' scope=spfile sid='RACDB1';
- Shut down and disable CRS auto-start on all the Oracle RAC cluster nodes.
# Grid_home/bin/crsctl stop crs # Grid_home/bin/crsctl disable crs
At this stage Oracle Grid Infrastructure, the Oracle ASM instances, and the Oracle Database instances have been modified to make use of the newly created network interfaces.
- Log in to each of the Oracle ASM instances in the Oracle RAC cluster using SQL*Plus as SYS, and run the following command:
- Modify
cellip.ora
andcellinit.ora
on all the cluster nodes (user domains).Perform these steps on any one database server node of the cluster (user domain for an Oracle VM RAC cluster).
- Make a backup of the
cellip.ora
andcellinit.ora
files.# cd /etc/oracle/cell/network-config # cp cellip.ora cellip.ora-bak # cp cellinit.ora cellinit.ora-bak
- Modify the
cellip.ora-bak
file to replace the existing IP address with the two storage pkey IP addresses of every storage cell that will be setup in step 7.The two IP addresses are separated by a semi-colon (;
). - Make sure ssh equivalence is set up for the
root
user to all the cluster nodes from this cluster node. - Replace the
cellip.ora
file on all the cluster nodes.Use the following commands to backup and then replace the
cellip.ora
file on all the cluster nodes. In this examplecluster_nodes
refers to a file containing the names of all the Oracle RAC cluster nodes of the Oracle VM RAC cluster, with each node on a separate line.# /usr/local/bin/dcli -g cluster_nodes –l root "/bin/cp /etc/oracle/cell/network-config/cellip.ora /e tc/oracle/cell/network-config/cellip-orig.ora" # /usr/local/bin/dcli -g cluster_nodes –l root –f celli p.ora-bak –d /etc/oracle/cell/network-config/cellip.ora
- Manually edit the
/etc/oracle/cell/network-config/cellinit.ora-bak
file to replace the existing IP addresses and netmask with the two storage pkey IP addresses and netmask of the cluster node which was used in step 3. - Make sure ssh equivalence is set up for the
root
user to all the cluster nodes from this cluster node. - Replace the
cellinit.ora
file on all the cluster nodes.The IP address and netmask were used in the third and fourth run of step 3.
Use the following commands to backup and then replace the
cellinit.ora
file on all the cluster nodes. In this examplecluster_nodes
refers to a file containing the names of all the Oracle RAC cluster nodes of the Oracle VM RAC cluster, with each node on a separate line.# /usr/local/bin/dcli -g cluster_nodes –l root "/bin/cp /etc/oracle/cell/network-config/cellinit.ora /e tc/oracle/cell/network-config/cellinit-orig.ora" # /usr/local/bin/dcli -g cluster_nodes –l root –f cellini t.ora-bak –d /etc/oracle/cell/network-config/cellinit.ora
- Make a backup of the
- In the management domains (dom0s), modify the user domain configuration file for each user domain to use the partition key applicable to that user domain.
Modify all the relevant
vm.cfg
files in the management domain. This step is applicable only for Oracle VM environments. Log in to all the management domains and manually edit/EXAVMIMAGES/GuestImages/user_domain_name/vm.cfg
to include the partition keys created in step 2.For example, modify the line:
ib_pkeys = [{'pf':'40:00.0','port':'1','pkey':['0xffff' ,]},{'pf':'40:00.0','port':'2','pkey':['0xffff',]},]
to:
ib_pkeys = [{'pf':'40:00.0','port':'1','pkey':['0xa000' ,'0xaa00',]},{'pf':'40:00.0','port':'2','pkey':['0xa000 ','0xaa00',]},]
In this example,
0xa000
is the cluster partition key derived from the clusterpkey_id
value entered in step 2, and0xaa00
is the storage partition key derived from the storagepkey_id
value.You can use the following command to derive the partition key values to use in
vm.cfg
from thepkey_id
values entered in step 2.FinalHexValue=$(echo "obase=16;ibase=2;$(expr 100000000 0000000 + $(echo "obase=2;ibase=16;$(echo $HexValue|tr [:lower:] [:upper:])"|bc))"|bc|tr [:upper:] [:lower:])
FinalHexValue
is the value that you enter invm.cfg
andHexValue
is the value entered in step 2 forpkey_id
. - Modify the storage cells to use the newly created IPoIB interfaces.
- Make sure
run_create_pkeys.sh
andcreate_pkey_files.sh
are available and that they are in the same directory on the same driver_dom0 node used in the previous steps. - Make sure passwordless ssh is set up from the driver_dom0 node to all the storage cells that need to be configured for partition keys.
- Run
run_create_pkeys.sh
.For storage servers, you need to run the script twice for every storage server with a node_type value of
cell
.The syntax for this script is:
run_create_pkeys.sh node_name interface_name pkey_id node_type pkey_ipaddr pkey_netmask pkey_interfaceType
node_name
specifies the storage server.interface_name
is eitherib0
orib1
.pkey_id
specifies the pkey without the0x
prefix. The value used here is the cluster partition key derived from the storage pkey_id value entered in step 2.node_type
is eithercompute
orcell
.pkey_ipaddr
specifies the IP address.pkey_netmask
specifies the netmask in CIDR format, for example,/21
.pkey_interfaceType
iscluster
orstorage
for compute node types, orstorage
for cell node types.
You can use the following command to derive the partition key values to be used for the
run_create_pkeys.sh
script from thepkey_id
value entered in step 2.FinalHexValue=$(echo "obase=16;ibase=2;$(expr 1000000000000000 + $(echo "obase=2;ibase=16;$(echo $HexValue|tr [:lower:] [:upper:])"|bc))" |bc|tr [:upper:] [:lower:])
FinalHexValue
is the value that will be entered in the command here andHexValue
is the value entered in step 2 forpkey_id
.The following table provides an example of the inputs for the two runs for a storage server:
Table 5-5 Two Runs for Storage Servers
Run Interface Name pkey_id node_type pkey_ipaddress pkey_netmask pkey_interfaceType 1
ib0
aa00
cell
192.168.114.1
/20
storage
2
ib1
aa00
cell
192.168.114.2
/20
storage
You use these values in each run of the script, denoted by the Run column, as shown in this example, where
cell01
is the name of the storage server.# ./run_create_pkeys.sh cell01 ib0 aa00 cell 192.168.114.1 /20 storage
Note:
You can ignore the following messages from the script. The restart of the storage cells at the end of this task will take care of these issues.
Network configuration altered. Please issue the following commands as root to restart the network and open IB stack: service openibd restart service network restart A restart of all services is required to put new network configuration into effect. MS-CELLSRV communication may be hampered until restart.
At this stage the storage servers (cells) have been modified to use the new network interfaces upon restart.
- Make sure
- Modify the
/opt/oracle.cellos/cell.conf
file on each storage server and restart the storage servers.- Make a backup of the
/opt/oracle.cellos/cell.conf
file.# cd /opt/oracle.cellos # cp cell.conf cell.conf-prepkey
- Change the Pkey configuration lines in
/opt/oracle.cellos/cell.conf
.Change this line:
<Pkeyconfigured>no</Pkeyconfigured>
to:
<Pkeyconfigured>yes</Pkeyconfigured>
Change this line for the 2 private interfaces ib0 and ib1:
<IP_enabled>yes</IP_enabled>
to:
<IP_enabled>no</IP_enabled>
- Make sure Oracle Grid Infrastructure is stopped on all Oracle VM RAC nodes.
- Restart all the storage cell servers.
# shutdown -r now
- Verify that the new pkey-enabled network interfaces are in use.
# cellcli -e list cell detail | egrep 'interconnect|ipaddress'
The output should show the new pkey-enabled interfaces (
stib0
andstib1
) along with the new set of IP addresses.
- Make a backup of the
- Restart the Oracle RAC clusters.
- Log in to the corresponding management domain of each of the user domain nodes.
- Run the following commands:
# xm shutdown user_domain_name # xm create /EXAVMIMAGES/GuestImages/user_domain_name/vm.cfg
- Start and verify the Oracle Grid Infrastructure stack is fully started on all the cluster nodes.
- Start and enable auto-start of the Oracle Grid Infrastructure stack on all the Oracle RAC cluster nodes.
# $GRID_HOME/bin/crsctl start crs # $GRID_HOME/bin/crsctl enable crs
- After Oracle Grid Infrastructure has started on all the nodes, verify the
cluster_interconnects
parameter is set to use the newly configured pkey interfaces.Log in to a database instance and run the following query:
SQL> SELECT inst_id, value FROM gv$parameter WHERE name = 'cluster_interconnects'
- Remove the old cluster interconnect interfaces from the Oracle Cluster Registry (OCR).
# Grid_home/bin/oifcfg delif –global ib0/<old subnet> # Grid_home/bin/oifcfg delif –global ib1/<old subnet>
- Start and enable auto-start of the Oracle Grid Infrastructure stack on all the Oracle RAC cluster nodes.