4.2 Understanding Automated Storage Server Maintenance Tasks and Policies

The Management Server (MS) performs automated maintenance tasks on the metric repository and various file systems.

Some automated maintenance occurs routinely every hour, while other tasks occur in response to storage space pressure on a specific file system.

Every hour, MS automatically performs the following tasks:

  • MS automatically deletes metric observations that meet the following criteria:

    • For metrics with a default retention policy (retentionPolicy=default), MS automatically deletes metric observations that are older than the retention period defined by the metricHistoryDays cell attribute. By default, the metricHistoryDays retention period is 7 days.

    • For metrics with an annual retention policy (retentionPolicy=annual), MS automatically deletes metric observations that are older than one year.

  • MS automatically deletes various diagnostic files that are older than the retention period defined by the diagHistoryDays cell attribute. This includes various files in the LOG_HOME directory and temporary files larger than 5 MB in size located across the system. Files in the Automatic Diagnostic Repository (ADR) and diagnostic pack (diagpack) files in the LOG_HOME directory are not included in this process. By default, the diagHistoryDays retention period is 7 days.

  • MS automatically deletes eligible segments of the Oracle Exadata System Software alert.log and debug.log files. Each of these files is automatically segmented (saved to a new name) when it reaches 10 MB in size. To be eligible for deletion during this routine cleanup process, a file segment must be older than the retention period defined by the diagHistoryDays cell attribute and also not one of the 5 latest segments (most recent 50 MB) of the file.

  • MS automatically deletes eligible alerts from the cell alert history using the following criteria. Alerts are considered eligible if they are either stateful alerts that have been resolved or they are stateless alerts.

    • If there are less than 500 alerts, then eligible alerts older than 100 days are deleted.

    • If there are between 500 and 999 alerts, then eligible alerts older than 7 days are deleted.

    • If there are 1,000 or more alerts, then all eligible alerts are deleted.

Furthermore, MS routinely manages the ms-odl.trc and ms-odl.log files. Each of these files is automatically segmented (saved to a new name) when it reaches 5 MB in size. When a file segment is written, MS retains the latest 10 segments of the file and deletes any older segments.

In addition to the previously described routine tasks, MS automatically responds to alleviate storage space pressure. Specifically, when file system utilization reaches a predefined action threshold, MS automatically begins an iterative process to delete eligible metric, log, and trace files. The process continues until the file system utilization drops to the corresponding clearance threshold or until there are no more eligible files. The following table contains the action and clearance thresholds for each managed file system.

File Systems Threshold for Action to Automatically Reduce Space Pressure Threshold for Clearing the Action to Automatically Reduce Space Pressure

/ (root)

/tmp

/home

/var

/var/log

80%

75%

/opt/oracle

90%

85%

In summary, the process to ease space pressure works as follows:

  • MS first deletes all eligible metrics older than metricHistoryDays and all eligible log and trace files older than diagHistoryDays from the affected file system.

  • If the file system utilization drops below the clearance threshold, the process stops.

  • Otherwise, MS iteratively deletes the oldest eligible metrics, logs, and traces. For each iteration, MS reduces the effective retention period by half, down to a minimum of 10 minutes. This iterative file purging process continues until usage drops below the clearance threshold or all eligible files more than 10 minutes old have been removed.

  • An alert is automatically raised if the iterative file purging process cannot bring the file system utilization below the clearance threshold. The alert automatically clears when the file system utilization drops below the clearance threshold.

In the context of easing space pressure for a specific file system, eligible files include:

  • All metric and diagnostic data files on the file system that are routinely managed by MS, including filled segments of the cell alert.log, debug.log, ms-odl.trc, and ms-odl.log files.

  • Automatic Diagnostic Repository (ADR) log and trace files, if the file system contains the ADR.

  • Diagnostic pack (diagpack) files in the LOG_HOME directory, if the file system contains the LOG_HOME directory.

  • Crash files, except that the most recent crash file is maintained if it is less than 30 days old.

  • Files over 5 MB in size and older than one day in the cellmonitor and celladmin home directories, /tmp, /var/log/exadatatmp, and /var/spool.

Note:

In any event, MS retains all files and directories with SAVE embedded in the name.

Related Topics