3.9 Creating a Regular Expression-Based Redaction Policy

A regular expression-based redaction policy enables you to redact data based on a search-and-replace model.

3.9.1 About Creating Regular Expression-Based Redaction Policies

Regular expression-based redaction enables you to search for patterns of data to redact.

For example, you can use regular expressions to redact email addresses, which can have varying character lengths. It is designed for use with character data only. You can use formats for the search and replace operation, or you can create custom pattern formats.

You cannot use regular expressions to redact a subset of the values in a column. The REGEXP_PATTERN (regular expression pattern) must match all of the values in order for the REGEXP_REPLACE_STRING setting to take effect, and the REGEXP_REPLACE_STRING must change the value.

For rows where the REGEXP_PATTERN fails to match, Data Redaction performs DBMS_REDACT.FULL redaction. This mitigates the risk of showing the actual data for those rows which it failed to match because of a mistake in the REGEXP_PATTERN causing the regular expression to fail to match all of the values in the column.

In addition, if no change to the value occurs as a result of the REGEXP_REPLACE_STRING setting during regular expression replacement operation, Data Redaction performs DBMS_REDACT.FULL redaction.

3.9.2 Syntax for Creating a Regular Expression-Based Redaction Policy

The regexp_* parameters of the DBMS_REDACT.ADD_POLICY procedure can create a regular expression-based redaction policy.

The DBMS_REDACT.ADD_POLICY fields for creating a regular expression-based data redaction policy are as follows:

DBMS_REDACT.ADD_POLICY (
   object_schema           IN VARCHAR2 := NULL, 
   object_name             IN VARCHAR2,
   column_name             IN VARCHAR2 := NULL,
   policy_name             IN VARCHAR2,
   function_type           IN BINARY_INTEGER := NULL,
   expression              IN VARCHAR2,
   enable                  IN BOOLEAN := TRUE,
   regexp_pattern          IN VARCHAR2 := NULL,
   regexp_replace_string   IN VARCHAR2 := NULL,
   regexp_position         IN BINARY_INTEGER := 1,
   regexp_occurrence       IN BINARY_INTEGER := 0,
   regexp_match_parameter  IN VARCHAR2 := NULL);

In this specification:

  • object_schema, object_name, column_name, policy_name, expression, enable: See General Syntax of the DBMS_REDACT.ADD_POLICY Procedure.

  • function_type: Specifies the type of redaction. For regular expression based redaction, use either DBMS_REDACT.REGEXP or DBMS_REDACT.REGEXP_WIDTH.

    If you use the DBMS_REDACT.REGEXP redaction type, then no truncation occurs. This applies even if the redacted value is wider than the column width, and the Oracle Call Interface width attribute (OCI_ATTR_CHAR_SIZE) of the column is not preserved. (It becomes 4000, just as it does when the REGEXP_REPLACE SQL operator is used on a column.)

    Using the DBMS_REDACT.REGEXP_WIDTH redaction type truncates any redacted value that exceeds the width of the column, and ensures that the OCI width attribute of the column (OCI_ATTR_CHAR_SIZE) remains unchanged.

    Note the following:

    • Use the DBMS_REDACT.REGEXP_WIDTH function type if your applications depend on the value of the OCI_ATTR_CHAR_SIZE attribute. For example, applications that are built using the Oracle OLE DB Provider interface are sensitive to the value of the OCI_ATTR_CHAR_SIZE attribute. If you use DBMS_REDACT.REGEXP as the redaction type, then the OCI_ATTR_CHAR_SIZE always becomes 4000. This setting makes it unsuitable as the redaction type of policies on tables that are used by Oracle OLE DB based applications. See Oracle Call Interface Developer's Guide for more information about Oracle Call Interface parameter attributes.

    • When you set the function_type parameter to DBMS_REDACT.REGEXP or DBMS_REDACT.REGEXP_WIDTH, omit the function_parameters parameter from the DBMS_REDACT.ADD_POLICY procedure.

    • Specify the regular expression parameters in much the same way that you specify the pattern, replace, position, occurrence, and match_parameter arguments to the REGEXP_REPLACE SQL function. See Oracle Database SQL Language Reference for information about the REGEXP_REPLACE SQL function.

  • regexp_pattern: Describes the search pattern for data that must be matched. If it finds a match, then Oracle Database replaces the data as specified by the regexp_replace_string setting. See the following sections for more information:

  • regexp_replace_string: Specifies how you want to replace the data to be redacted. See the following sections for more information:

  • regexp_position: Specifies the starting position for the string search. The value that you enter must be a positive integer indicating the character of the column_name data where Oracle Database should begin the search. The default is 1 or the DBMS_REDACT.RE_BEGINNING format, meaning that Oracle Database begins the search at the first character of the column_name data.

  • regexp_occurrence: Specifies how to perform the search and replace operation. The value that you enter must be a non-negative integer indicating the occurrence of the replace operation:

    • If you specify 0 or the DBMS_REDACT.RE_ALL format, then Oracle Database replaces all the occurrences of the match.

    • If you specify the DBMS_REDACT.RE_FIRST format, then Oracle Database replaces the first occurrence of the match.

    • If you specify a positive integer n, then Oracle Database replaces the nth occurrence of the match.

    If the occurrence is greater than 1, then the database searches for the second occurrence beginning with the first character following the first occurrence of pattern, and so forth.

  • regexp_match_parameter: Specifies a text literal that lets you change the default matching behavior of the function. The behavior of this parameter is the same for this function as for the REGEXP_REPLACE SQL function. See Oracle Database SQL Language Reference for detailed information.

    To filter the search so that it is not case sensitive, specify the RE_CASE_INSENSITIVE format.

3.9.3 Regular Expression-Based Redaction Policies Using Formats

The DBMS_REDACT.ADD_POLICY procedure supports both regexp_pattern and regexp_replace_string parameters.

3.9.3.1 Regular Expression Formats

The regular expression formats represent commonly used expressions, such as the replacement of digits within a credit card number.

Table 3-10 describes the formats that you can use with the regexp_pattern parameter in the DBMS_REDACT.ADD_POLICY procedure.

Table 3-10 Formats for the regexp_pattern Parameter

Format Description

DBMS_REDACT.RE_PATTERN_ANY_DIGIT

Searches for any digit. Replaces the identified pattern with the characters specified by the regexp_replace_string parameter. The DBMS_REDACT.RE_PATTERN_ANY_DIGIT is commonly used with the following values of the regexp_replace_string parameter:

regexp_replace_string => DBMS_REDACT.RE_REDACT_WITH_SINGLE_X

This setting replaces any matched digit with the X character.

The following setting replaces any matched digit with the 1 character.

regexp_replace_string => DBMS_REDACT.RE_REDACT_WITH_SINGLE_1

DBMS_REDACT.RE_PATTERN_CC_L6_T4

Searches for the middle digits of any credit card (other than American Express) that has 6 leading digits and 4 trailing digits. Replaces the identified pattern with the characters specified by the regexp_replace_string parameter.

The appropriate regexp_replace_string setting to use with this format is DBMS_REDACT.RE_REDACT_CC_MIDDLE_DIGITS, which finds any credit card that could have 6 leading and 4 trailing digits left as actual data. It then redacts the middle digits.

DBMS_REDACT.RE_PATTERN_CCN

Matches credit card numbers other than American Express credit card numbers. The appropriate regexp_replace_string setting to use with this format is DBMS_REDACT.RE_REDACT_CCN. The end result is a redaction of all the digits except the last 4.

DBMS_REDACT.RE_PATTERN_AMEX_CCN

Matches American Express credit card numbers. The appropriate regexp_replace_string setting to use with this format is DBMS_REDACT.RE_REDACT_AMEX_CCN. The end result is a redaction of all the digits except the last 5.

DBMS_REDACT.RE_PATTERN_US_PHONE

Searches for any U.S. telephone number. Replaces the identified pattern with the characters specified by the regexp_replace_string parameter

The appropriate regexp_replace_string setting to use with this format is DBMS_REDACT.RE_REDACT_US_PHONE_L7, which finds United States phone numbers and then redacts the last 7 digits.

DBMS_REDACT.RE_PATTERN_EMAIL_ADDRESS

Searches for any email address. Replaces the identified pattern with the characters specified by the regexp_replace_string parameter

The appropriate regexp_replace_string settings that you can use with this format are as follows:

RE_REDACT_EMAIL_NAME, which finds any email address and redacts the email user name

RE_REDACT_EMAIL_DOMAIN, which finds any email address and redacts the email domain

RE_REDACT_EMAIL_ENTIRE, which finds any email address and redacts the entire email address

DBMS_REDACT.RE_PATTERN_IP_ADDRESS

Searches for an IP address. Replaces the identified pattern with the characters specified by the regexp_replace_string parameter.

The appropriate regexp_replace_string setting to use with this format is DBMS_REDACT.RE_REDACT_IP_L3, which replaces the last section of the dotted decimal string representation of an IP address with the characters 999 to indicate that it was redacted.

The table below describes formats that you can use with the regexp_replace_string parameter in the DBMS_REDACT.ADD_POLICY procedure.

Table 3-11 Formats for the regexp_replace_string Parameter

Format Description

DBMS_REDACT.RE_REDACT_WITH_SINGLE_X

Replaces the data with a single X character for each of the actual data characters. For example, the credit card number 5105 1051 0510 5100 could be replaced with XXXX XXXX XXXX XXXX.

DBMS_REDACT.RE_REDACT_WITH_SINGLE_1

Replaces the data with a single 1 digit for each of the actual data digits. For example, the credit card number 5105 1051 0510 5100 could be replaced with 1111 1111 1111 1111.

DBMS_REDACT.RE_REDACT_CC_MIDDLE_DIGITS

Redacts the middle digits in credit card numbers, as specified by setting the regexp_pattern parameter with the RE_PATTERN_CC_L6_T4 format. The redaction replaces each redacted character with an X. For example, the credit card number 5105 1051 0510 5100 could be replaced with 5105 10XX XXXX 5100.

DBMS_REDACT.RE_REDACT_CCN

Redacts the first 12 digits of a credit card number other than an American Express card number. For example, 4012888888881881 is redacted to ************1881.

DBMS_REDACT.RE_REDACT_AMEX_CCN

Redacts the first 10 digits of an American Express number. For example, 378282246310005 is redacted to **********10005.

DBMS_REDACT.RE_REDACT_PHONE_L7

Redacts the last 7 digits of U.S. telephone numbers, as specified by setting the regexp_pattern parameter with the RE_PATTERN_US_PHONE format. The redaction replaces each redacted character with an X. This setting only applies to hyphenated phone numbers, not phone numbers with spaces. For example, the telephone number 415-555-0100 could be replaced with 415-XXX-XXXX.

DBMS_REDACT.RE_REDACT_EMAIL_NAME

Redacts the email name as specified by setting the regexp_pattern parameter with the RE_PATTERN_EMAIL_ADDRESS format. The redaction replaces the email user name with four x characters. For example, the email address psmith@example.com could be replaced with xxxx@example.com.

DBMS_REDACT.RE_REDACT_EMAIL_DOMAIN

Redacts the email domain name as specified by setting the regexp_pattern parameter with the RE_PATTERN_EMAIL_ADDRESS format. The redaction replaces the domain with five x characters. For example, the email address psmith@example.com could be replaced with psmith@xxxxx.com.

DBMS_REDACT.RE_REDACT_IP_L3

Redacts the last three digits of the IP address as specified by setting the regexp_pattern parameter with the RE_PATTERN_IP_ADDRESS format. For example, the IP address 192.0.2.254 could be replaced with 192.0.2.999, which is an invalid IP address.

3.9.3.2 Example: Regular Expression Redaction Policy Using Formats

The DBMS_REDACT.ADD_POLICY procedure can create a regular expression redaction policy that uses formats.

Example 3-7 shows how to use regular expression formats to redact credit card numbers.

Example 3-7 Regular Expression Data Redaction Character Values

BEGIN
 DBMS_REDACT.ADD_POLICY(
   object_schema          => 'mavis', 
   object_name            => 'cust_info', 
   column_name            => 'cc_num',
   policy_name            => 'redact_cust_cc_nums', 
   function_type          => DBMS_REDACT.REGEXP,
   expression             => '1=1',
   regexp_pattern         => DBMS_REDACT.RE_PATTERN_CC_L6_T4,
   regexp_replace_string  => DBMS_REDACT.RE_REDACT_CC_MIDDLE_DIGITS,
   regexp_position        => DBMS_REDACT.RE_BEGINNING,
   regexp_occurrence      => DBMS_REDACT.RE_FIRST,
   regexp_match_parameter => DBMS_REDACT.RE_CASE_INSENSITIVE,
   policy_description     => 'Regular expressions to redact credit card numbers',
   column_description     => 'cc_num contains customer credit card numbers');
END;
/

Query and redacted result:

SELECT cc_num FROM mavis.cust_info;

CC_NUM
-------
401288XXXXXX1881
411111XXXXXX1111
555555XXXXXX1111
511111XXXXXX1118

3.9.4 Custom Regular Expression Redaction Policies

You can customize regular expressions in Data Redaction policies.

3.9.4.1 Settings for Custom Regular Expressions

Oracle Data Redaction provides special settings to configure policies that use regular expressions.

To create custom regular expression redaction policies, you use the following parameters in the DBMS_REDACT.ADD_POLICY procedure:

  • regexp_pattern: This pattern is usually a text literal and can be of any of the data types CHAR, VARCHAR2, NCHAR, or NVARCHAR2. The pattern can contain up to 512 bytes. For further information about writing the regular expression for the regexp_pattern parameter, see the description of the pattern argument of the REGEXP_REPLACE SQL function in Oracle Database SQL Language Reference, because the support that Data Redaction provides for regular expression matching is similar to that of the REGEXP_REPLACE SQL function.

  • regexp_replace_string: This data can be of any of the data types CHAR, VARCHAR2, NCHAR, or NVARCHAR2. The regexp_replace_string can contain up to 500 back references to subexpressions in the form \n, where n is a number from 1 to 9. If you want to include a backslash (\) in the regexp_replace_string setting, then you must precede it with the escape character, which is also a backslash. For example, to literally replace the matched pattern with \2 (rather than replace it with the second matched subexpression of the matched pattern), you enter \\2 in the regexp_replace_string setting. For more information, see Oracle Database SQL Language Reference.

See Also:

General Syntax of the DBMS_REDACT.ADD_POLICY Procedure for information about other DBMS_REDACT.ADD_POLICY parameters

3.9.4.2 Example: Custom Regular Expression Redaction Policy

The DBMS_REDACT.ADD_POLICY procedure regexp* parameters can create a custom regular expression redaction policy.

Example 3-8 shows how to use regular expressions to redact the emp_id column data. In this example, taken together, the regexp_pattern and regexp_replace_string parameters do the following: first, find the pattern of 9 digits. For reference, break them into three groups that contain the first 3, the next 2, and then the last 4 digits. Then, replace the first five digits with XXXXX concatenated with the third group (the last 4 digits) as found in the original pattern.

Example 3-8 Partially Redacted Data Redaction Using Regular Expressions

BEGIN
 DBMS_REDACT.ADD_POLICY(
   object_schema          => 'mavis', 
   object_name            => 'cust_info', 
   column_name            => 'emp_id',
   policy_name            => 'redact_cust_ids', 
   function_type          => DBMS_REDACT.REGEXP,
   expression             => '1=1',
   regexp_pattern         => '(\d\d\d)(\d\d)(\d\d\d\d)',
   regexp_replace_string  => 'XXXXX\3',
   regexp_position        => 1,
   regexp_occurrence      => 0,
   regexp_match_parameter => 'i',
   policy_description     => 'Redacts employee IDs using regular expression',
   column_description     => 'emp_id contains employee ID numbers');
END;
/

Query and redacted result:

SELECT emp_id FROM mavis.cust_info;

EMP_ID
------------
XXXXX1234
XXXXX5678