3.6.2 Replacing an M.2 Disk Due to Failure or Other Problems

Failure of an M.2 disk reduces redundancy of the system area, and can impact patching, imaging, and system rescue. Therefore, a problem disk should be replaced with a new disk as soon as possible. When a M.2 disk fails, the storage server automatically and transparently switches to using the software stored on the inactive system disk, making it the active system disk.

If an M.2 disk fails, an Exadata alert is generated. The alert includes specific instructions for replacing the disk. If you have configured the system for alert notifications, then the alert is sent by e-mail to the designated address.

An M.2 disk is hot-pluggable and can be replaced when the power is on. After an M.2 disk is replaced, Oracle Exadata System Software automatically adds the new device to the system partition and starts the rebuilding process.

  1. If you are replacing a failed M.2 disk, confirm the identity and location of the failed disk.
    CellCLI> LIST PHYSICALDISK WHERE diskType=M2Disk AND status!=normal DETAIL
             name:                      M2_SYS_0
             deviceName:                /dev/sda
             diskType:                  M2Disk
             makeModel:                 "INTEL SSDSCKJB150G7"
             physicalFirmware:          N2010112
             physicalInsertTime:        2017-07-14T08:42:24-07:00
             physicalSerial:            PHDW7082000M150A
             physicalSize:              139.73558807373047G
             slotNumber:                "M.2 Slot: 0"
             status:                    failed
    

    Take note of the slot number to ensure that you properly identify the failed disk when you open the server chassis, and skip directly to physically replacing the failed disk (Step 3).

  2. If you are proactively replacing an M.2 disk, you must prepare to replace the disk:
    1. Identify the M.2 disk you want to replace an confirm that the other disk is normal.
      CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=M2Disk DETAIL
         name:                  M2_SYS_0 
         deviceName:            /dev/sdm   
         diskType:              M2Disk   
         makeModel:             "INTEL SSDSCKJB150G7"   
         physicalFirmware:      N2010112    
         physicalInsertTime:    2017-08-24T18:55:13-07:00   
         physicalSerial:        PHDW708201G0150A   
         physicalSize:          139.73558807373047G   
         slotNumber:            "M.2 Slot: 0"   
         status:                normal   
      
         name:                  M2_SYS_1   
         deviceName:            /dev/sdn   
         diskType:              M2Disk   
         makeModel:             "INTEL SSDSCKJB150G7"    
         physicalFirmware:      N2010112   
         physicalInsertTime:    2017-08-24T18:55:13-07:00   
         physicalSerial:        PHDW708200SZ150A   
         physicalSize:          139.73558807373047G   
         slotNumber:            "M.2 Slot: 1"   
         status:                normal

      Use the output to find the name of the disk your want to replace and confirm that the status of the other disk is normal. For example, if you want to replace the disk in slot 0, ensure that status=normal for the disk in slot 1.

      Note:

      Do not proactively replace an M.2 disk unless the status of the other M.2 disk is normal.

    2. Prepare the M.2 disk you want to replace.

      Use the ALTER PHYSICALDISK command with the DROP FOR REPLACEMENT option.

      For example:

      CellCLI> ALTER PHYSICALDISK M2_SYS_0 DROP FOR REPLACEMENT
      Physical disk M2_SYS_0 was dropped for replacement.
      

      Note:

      After the command, the service LED will be lit on the corresponding M.2 disk.

  3. Locate the storage server and physically replace the M.2 disk

    The M.2 disks are hot pluggable, so you do not need to power down the cell before replacing the disk.

    To help with physically identifying the storage server, the white locator LED should be illuminated on the front of the server.

    When you open the server chassis, take case to identify the correct M.2 disk using the information gathered earlier. As an aid, the service LED on the M.2 disk should also be lit.

    For detailed instructions about replacing the M.2 disk, see Servicing Internal M.2 Flash SSDs in the applicable Exadata hardware service guide (listed in Related Documentation).

  4. Confirm the M.2 disk has been replaced.
    CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=M2Disk DETAIL
       name:                  M2_SYS_0 
       deviceName:            /dev/sdm   
       diskType:              M2Disk   
       makeModel:             "INTEL SSDSCKJB150G7"   
       physicalFirmware:      N2010112    
       physicalInsertTime:    2017-10-11T12:07:24-07:00   
       physicalSerial:        PHDW708200SW261B   
       physicalSize:          139.73558807373047G   
       slotNumber:            "M.2 Slot: 0"   
       status:                normal   
    
       name:                  M2_SYS_1   
       deviceName:            /dev/sdn   
       diskType:              M2Disk   
       makeModel:             "INTEL SSDSCKJB150G7"    
       physicalFirmware:      N2010112   
       physicalInsertTime:    2017-08-24T18:55:13-07:00   
       physicalSerial:        PHDW708200SZ150A   
       physicalSize:          139.73558807373047G   
       slotNumber:            "M.2 Slot: 1"   
       status:                normal
  5. Confirm the system disk arrays are have an active sync status, or are being rebuilt.
    # mdadm --detail /dev/md[2-3][4-5]
    /dev/md24:
          Container : /dev/md/imsm0, member 0
         Raid Level : raid1
         Array Size : 104857600 (100.00 GiB 107.37 GB)
      Used Dev Size : 104857600 (100.00 GiB 107.37 GB)
       Raid Devices : 2
      Total Devices : 2
    
                   State  : active
     Active Devices  : 2
    Working Devices  : 2
     Failed Devices  : 0
       Spare Devices : 0  
    
                UUUID : 152f728a:6d294098:5177b2e5:8e0d766c
       Number    Major    Minor    RaidDevice    State
            1        8       16             0    active sync  /dev/sdb
            0        8        0             1    active sync  /dev/sda
    /dev/md25:
          Container : /dev/md/imsm0, member 1
         Raid Level : raid1
         Array Size : 41660416 (39.73 GiB 42.66 GB)
      Used Dev Size : 41660544 (39.73 GiB 42.66 GB)
       Raid Devices : 2
      Total Devices : 2
    
                   State  : clean
     Active Devices  : 2
    Working Devices  : 2
     Failed Devices  : 0
       Spare Devices : 0  
    
                 UUID : 466173ba:507008c7:6d65ed89:3c40cf23
       Number    Major    Minor    RaidDevice    State
            1        8       16             0    active sync  /dev/sdb
            0        8        0             1    active sync  /dev/sda