4 Data Masking
This chapter provides conceptual information about the features that comprise Oracle Data Masking, and procedural information about performing the task sequence including Masking Formats and Masking Definitions. Data masking presupposes that you have created an Application Data Model (ADM) with sensitive columns defined.
Introduction
Enterprises run the risk of breaching sensitive information when copying production data into non-production environments for the purposes of application development, testing, data sharing or analysis. Oracle Data Masking helps reduce this risk by irreversibly replacing the original sensitive data with fictitious data so that production data can be shared safely with non-production users. Accessible through Oracle Enterprise Manager, Data Masking provides end-to-end secure automation for provisioning test databases from production in compliance with regulations.
Data masking (also known as data scrambling and data anonymization) is the process of replacing sensitive information copied from production databases to non-production databases with realistic, but scrubbed, data based on masking rules. Data masking is ideal for virtually any situation when confidential or regulated data needs to be shared with non-production users. These users may include internal users such as application developers, or external business partners such as offshore testing companies, suppliers and customers. These non-production users need to access some of the original data, but do not need to see every column of every table, especially when the information is protected by government regulations.
Data masking enables organizations to generate realistic and fully functional data with similar characteristics as the original data to replace sensitive or confidential information. This contrasts with encryption or Virtual Private Database, which simply hides data, and the original data can be retrieved with the appropriate access or key. With data masking, the original sensitive data cannot be retrieved or accessed.
Names, addresses, phone numbers, and credit card details are some of the examples of data that require protection of the information content from inappropriate visibility. Live production database environments contain valuable and confidential data access to this information is tightly controlled. However, each production system usually has replicated development copies, and the controls on such test environments are less stringent. This greatly increases the risks that the data might be used inappropriately. Data masking can modify sensitive database records so that they remain usable, but do not contain confidential or personally identifiable information. Yet, the masked test data resembles the original in appearance to ensure the integrity of the application.
Related Oracle Security Offerings
Table 4-1 Related Offerings
Oracle Solutions | Description |
---|---|
Oracle Data Masking and Subsetting | Software for creating masked and subsetted copies of production data for use in non-production environments such as testing and development databases. |
Oracle Data Safe (Data Discovery/ Data Masking) | Discover and mask sensitive data with a cloud service that supports Oracle Databases everywhere: in the Oracle Cloud, on-premises, and third-party clouds. |
Oracle Label Security | Implements Multi-Level Security (MLS), enabling rows with differing sensitivity to reside in the same table. Explicitly labels rows with group, compartment, and sensitivity levels then matches them with user labels. |
Oracle Data Redaction | Redacts sensitive data from query results before display through client applications. Enforces redaction at runtime, with low overhead, and according to conditions set in policies. |
Data Masking Components
Data Masking consists of two main components:
-
A masking format represents the definition on how to mask some given data. For example, an IP Address masking format could be a 3 random digits from 0 to 255 followed by a dot (.) fixed string repeated 4 times (without a trailing dot). A masking format can either be one that you create, or one from the list of Oracle-supplied default masking formats.
As a matter of best practice, organizations should create masking formats for all commonly regulated information so that the formats can be applied to the sensitive data regardless of which database the sensitive data resides in. This ensures that all sensitive data is consistently masked across the entire organization.
-
A masking definition defines a data masking operation to be implemented on one or more tables in a database. Masking definitions associate table columns with formats to use for masking the data. They also maintain the relationship between columns that are not formally declared in the database using related columns.
You can create a new masking definition or use an existing definition for a masking operation. To create a masking definition, you specify the column of the table and how data should be masked. If the columns being masked are involved in unique, primary key, or foreign key constraints, data masking generates the values so that the constraints are not violated. Masking ensures uniqueness per character using decimal arithmetic. For example, a numeric string of length 5 generates a maximum of only 99999 unique values. Similarly, a numeric string of length 1 generates a maximum of only 9 unique values.
You can export masking definitions to files and import them on other systems. This is important when the test and production sites reside on different Oracle Management Systems or on entirely different sites.
Recommended Data Masking Workflow
The following figure shows that the production database is cloned to a staging region and then masked there. During the masking process, the staging and test areas are tightly controlled like a production site.
Figure 4-1 Recommended Data Masking Workflow

Description of "Figure 4-1 Recommended Data Masking Workflow"
The workflow diagram illustrates the process of data masking and subsetting within Oracle environments, with the following steps:
- Sensitive Data Discovery: Identifies sensitive data that requires protection.
- Masking Format: Involves creating masking formats, either through predefined Oracle-supplied templates or custom masking rules.
- Masking Definition: Consolidates sensitive columns and their associated masking formats, which define how to mask the sensitive data in these columns.
- Clone Production to Staging: A copy of production data is cloned to a staging environment where masking operations are performed.
- Execute Masking Script: Applies the defined masking rules to the cloned data, ensuring sensitive data is protected.
- Subset Definition: A definition to subset data, limiting data exposure.
- Add Masking Definition: Combines the masking rules with the subsetting definition.
- Execute Masking/Subsetting Script: Executes both masking and subsetting on the data set.
- Sensitive Data Sharing: The masked and subsetted data can then be safely shared without revealing sensitive information.
Data masking is an iterative and evolving process handled by the security administrator and implemented by the database administrator. When you first configure data masking, try out the masking definition on a test system, then add a greater number of columns to the masking definition and test it to make sure it functions correctly and does not break any application constraints. During this process, you should exercise care when removing all embedded references to the real data while maintaining referential integrity.
After data masking is configured to your satisfaction, you can use the existing definition to repeatedly mask after cloning. The masking definition, however, would need to evolve as new schema changes require new data and columns to be masked.
After the masking process is complete, you can distribute the database for wide availability. If you need to ship the database to another third-party site, you are required to use the Data Pump Export utility, and then ship the dump file to the remote site.
You can also perform inline, or at the source, data masking while creating a subset definition.
Masking Formats
A masking definition requires one or more masking formats for any columns included in the masking definition. When adding columns to a masking definition, you can either create masking formats manually or import them from the format library. It is often more efficient to work with masking formats from the format library.
A masking format is made up of one or more format entries, where a format entry is a predefined type of supported masking method such as Fixed String, Random String, Shuffle etc. In essence, it describes how to transform an input value to an output value. The masking format is a combination of one or more of these transformations which chain together to take some input value and output a final masked value.
This section covers the following topics:
Creating a New Masking Format
This section describes how to create new masking formats using Enterprise Manager. To create a masking format:
- From the Targets menu, select Databases, then click the
Security menu and navigate to Data Masking and Subsetting then
Data Masking. You can also access the menu through the database home page
by selecting Security then Data Masking and Subsetting then Data
Masking.
Go to Masking Formats under Data Masking. The Masking Format Library page appears with predefined formats that Oracle Enterprise Manager provides.
- Click Create.
The Create Masking Format page appears, where you can define a masking format.
- Provide a required name for the new masking format, add a description,
select Sensitive Type, and then select a Custom Format Entry type from the
drop-down list.
A section appears that enables you to provide input for the format entry you have selected. For instance, if you select Array List, the subsequent box enables you to enter a list of comma-separated values, such as New York, New Jersey, and New Hampshire.
- Continue adding additional format entries as needed.
- When done, provide an optional user-defined or post-processing
function, then click Create to include this customized masking format in the
library.
The Masking Format page reappears with your newly created format displayed in the Format Library table. You can use this format later to mask a column of the same sensitive type.
Creating a Masking Format from an Existing Format
After you have created at least one format, you can use this format as a template, where you can implement most of the masking format using a different name and changing the entries as needed, rather than needing to create a new format from scratch.
- Navigate to the Masking Formats page and select the format you want to use as a template.
- Click Actions and choose Create Like.
- Enter a name for the new format and select a Sensitive Type.
- Customize the format entries as needed and click Create.
Oracle-supplied Predefined Masking Formats
Enterprise Manager provides several out-of-box predefined formats. All predefined formats and built-in formats are random.
Masking Format Categories and Characteristics
Data masking formats have characteristics. Some common characteristics include combinable, uniqueness, reversible, and deterministic:
- Combinable: A masking format is considered combinable when it can be combined with other basic masking formats or predefined masking formats though the use of conditions.
- Uniqueness: A masking format is characterized as having uniqueness if it ensures uniqueness of the generated masked data. These types of masking formats are useful for masking columns with uniqueness constraints.
- Reversible: A masking format that is characterized as reversible can retrieve original column data from masked data. Data masking usually means permanently replacing the data and ensuring that no one can retrieve the original data. But, sometimes you might want to see the original data.
- Deterministic: A deterministic masking format generates consistent output for a given input across databases and data masking jobs. Deterministic masking helps to maintain data integrity across multiple applications and preserve system integrity in a single sign-on environment.
MASKING FORMAT | DESCRIPTION | SAMPLE MASKED DATA | COMBINABLE | UNIQUENESS |
Visa Credit Card Number | ~10 billion unique Visa credit card numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
USA Phone Number Formatted | ~2.7 billion unique USA phone numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
USA Phone Number | ~2.7 billion unique USA phone numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
UPC Number Formatted | ~100 billion UPC numbers |
|
Yes | Yes. The number of distinct values in the specified range must be greater than or equal to the number of values in the column |
UPC Number | ~100 billion UPC numbers |
|
Yes | Yes. The number of distinct values in the specified range must be greater than or equal to the number of values in the column |
Social Security Number Formatted | ~718 million unique US Social Security Numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
Social Security Number | ~718 million unique US Social Security Numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
Social Insurance Number Formatted | ~1 billion unique Canadian Social Insurance Numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
Social Insurance Number | ~1 billion unique Canadian Social Insurance Numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
National Insurance Number Formatted | Generates unique UK National Insurance Numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
MasterCard Credit Card Number | ~10 billion unique MasterCard credit card number |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
ISBN (Thirteen Digit) Formatted | ~2 billion unique ISBN numbers |
|
Yes | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
ISBN (Thirteen Digit) | ~2 billion unique ISBN numbers |
|
Yes | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
ISBN (Ten Digit) Formatted | ~1 billion unique ISBN numbers |
|
Yes | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
ISBN (Ten Digit) | ~1 billion unique ISBN numbers |
|
Yes | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
Generic Credit Card Number Formatted | ~10 billion unique generic credit card numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
Generic Credit Card Number | ~10 billion unique generic credit card numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
Finnish Social Security Numbers | ~2.4 billion unique Finnish Social Security Numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
Discover Card Credit Card Number | ~10 billion unique Discover Card credit card numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
Auto Mask Format | Masking by scrambling the characters and numbers. This format preserves the input length, position of the characters and numbers, case of the character (upper or lower), and special characters in the input. |
|
Yes | Not applicable |
American Express Credit Card Number | ~10 billion unique American Express credit card numbers |
|
Yes, if the generated masked values passes the post processing function validation | Yes, if the number of distinct values that can be generated by the LMF is greater than the number of distinct values in the column |
Note:
The Deterministic and Reversible format characteristic values for all Masking Formats in the table above are "No".For more information on Masking Format Characteristics, see: Characteristics of Masking Formats
Credit Card Numbers
Out of the box, the format library provides many different formats for credit cards. The credit card numbers generated by these formats pass the standard credit card validation tests by the applications, thereby making them appear like valid credit card numbers.
Some of the masking formats provided include coverage for credit card types such as:
-
Mastercard
-
Visa
-
American Express
-
Discover
-
Generic credit card numbers (can cover all card types)
You may want to use different styles for storing credit card numbers, such as:
-
Pure numbers
-
'Space' for every four digits
-
'Hyphen' ( - ) for every four digits, and so forth
To implement the masked values in a certain format style, you can set the
DM_CC_FORMAT
variable of the DM_FMTLIB
package.
United States Social Security Numbers
Out of the box, you can generate valid U.S. Social Security (SSN) numbers. These SSNs pass the normal application tests of a valid SSN.
You can affect the format style by setting the DM_SSN_FORMAT
variable of the DM_FMTLIB
package. For example, if you set this
variable to '-', the typical social security number would appear as '123-45-6789'.
ISBN Numbers
Using the format library, you can generate either 10-digit or 13-digit ISBN
numbers. These numbers adhere to standard ISBN number validation tests. All of these
ISBN numbers are random in nature. Similar to other format definitions, you can affect
the "style" of the ISBN format by setting values to DM_ISBN_FORMAT
.
UPC Numbers
Using the format library, you can generate valid UPC numbers. They adhere to
standard tests for valid UPC numbers. You can affect the formatting style by setting the
DM_UPC_FORMAT
value of the DM_FMTLIB
package.
Canadian Social Insurance Numbers
Using the format library, you can generate valid Canadian Social Insurance
Numbers (SINs). These numbers adhere to standard tests of Canadian SINs. You can affect
the formatting style by setting the DM_CN_SIN_FORMAT
value of the
DM_FMTLIB
package.
North American Phone Numbers
Out of the box, the format library provides various possible U.S. and Canadian phone numbers. These are valid, realistic looking numbers that can pass standard phone number validation tests employed by applications. You can generate the following types of numbers:
-
Any North American phone numbers
-
Any Canadian phone number
-
Any U.S.A. phone number
UK National Insurance Numbers
Using the format library, you can generate valid unique random UK National Insurance Numbers (NINs). These numbers adhere to standard tests of UK NINs. A typical national insurance number would appear as 'GR 12 56 34 RS'.
Auto Mask
This format scrambles characters and numbers into masked characters and numbers and while retaining the format and length of the data, including special characters; for example, 'ABCD_343-ddg' masked as 'FHDT_657-tte'.
Format Entry Options to Customize Masking Format
Format entry options are as follows:
-
Array List
Accepts a list of values as input and maps each value in the list to a value in the input column. The number of values in the list should be greater than or equal to the number of distinct values in the masked column. The values in the user-provided list are ordered randomly before mapping them to the original column values. For example, if the original column contains values [10,20,30,40,50] and the Array List specified by the user is [99,100,101,102,103], the first masking run could produce the mapping [10,101], [20,103], [30,100], [40,99], [50,102] and a different masking run can produce [10,100], [20,99], [30,101], [40,102], [50,103].
A mapping table is created. The CTAS that creates the mapping table queries from:
-
The original table to fetch the column values being masked and a row number for each column value. The row number is derived from the Oracle-supplied
ROW_NUMBER
function. -
The user-passed list of values — values in the user-passed array list are converted into a table-like record set using the
SQL TABLE
function. A row number is also retrieved corresponding to each value in the record set. The row number is derived from theROWNUM
pseudo column. The values in the record set are randomly ordered usingDBMS_RANDOM.VALUE
function. -
The mapping table CTAS then joins the row numbers in both sub-queries to map the original value (from sub-query in step 1 above) and a value from the user-list (sub-query in step 2 above). Multiple executions of the CTAS will create a mapping table with different original-masked value mappings because of the random ordering of the user list in step 2.
Note:
If masking column is either Date or Timestamp, user can mask the column using Array List format. The Date/Timestamp formats supported are
andyyyy-MM-dd
yyyy-MM-DDTHH:mm:ssTZD
For example: "2015-01-18" and "2015-01-18T03:25:46Z".
-
-
Delete
Deletes a row based on a condition. If the condition matches, then the row is deleted on the target. A mapping table is created. The “
DELETE_VAL
” column in the mapping table is set to 1 for rows that are candidates to be deleted. For example, we are masking theSALARY
column and the masking definition has conditions on theEMPID
column and formats defined as:EMPID < 100 DELETE EMPID < 200 RANDOM NUMBERS [Start Value:1 End Value:100] DEFAULT PRESERVE ORIGINAL DATA
The mapping table will have the
DELETE_VAL
column set to 1 forSALARY
rows withEMPID < 100
.DELETE_VAL
for all other rows is set to 0. The final maskingCTAS SQL
which joins the original table and the mapping table to create the masked table filters out rows withDELETE_VAL
set to 1. Therefore, the rows in the original table that match the join condition are effectively “deleted”. -
Encrypt
The Encrypt masking format encrypts column data using Triple DES (3DES). The format of the column data after encryption is similar to that of the original values. For example, if you mask nine-digit numbers, the encrypted values also have nine digits. Encrypt is a deterministic and the only reversible masking format. It is helpful when businesses need to mask and send their data to a third party for analysis, reporting, or any other business processing purpose. After the processed data is received from the third party, the original data can be recovered (decrypted) using the same seed value that was used to encrypt the data.
You provide a regular expression to mask character or numeric type column. The specified regular expression must match all the original values in the column. If a value does not match the regular expression exactly, the masking format may no longer produce one-to-one mapping. Therefore, to ensure uniqueness, all the values must match the regular expression. The encrypted values also match the specified regular expression. Encrypt supports encryption of strings of fixed widths. It supports a subset of the regular expression language and does not support * or + syntax in regular expressions.
Note:
The maximum value of a user provided regular expression cannot exceed 64 bits.You also provide a seed value that is used to generate a key for encryption and decryption. The seed value has to be provided at the time of submitting a data masking job. It can be any string containing alphanumeric characters.
If your masking definition has a sensitive column using Encrypt, you are shown the decrypt option while submitting a data masking job. Choosing this option, you can decrypt the encrypted column values by providing the same seed used to encrypt the sensitive column.
-
Fixed Number
This format does not use a lookup or a mapping table. It assigns a fixed number value to a string/number column.
The type of column applicable to this entry is a
NUMBER
column or aSTRING
column. For example, if you mask a column that has a social security number, one of the entries can be Fixed Number '900'. This format is combinable. -
Fixed String
This format does not use a lookup or a mapping table. It assigns a fixed string value to a string column. For example, if you mask a column that has a License Plate Number, one of the entries can be Fixed String 'CA'. This format is combinable.
- Null Value
Masks the column with a value of NULL. It does not use a lookup or a mapping table.
-
Preserve Original Data
Preserves the original column value. Used in conditional masking with a combination of other formats where only a subset of values needs to be masked based on a condition.
-
Random Dates
The format creates a mapping table. The mapping table CTAS contains code to generate random dates within a user specified date range. A random date is generated using the following logic:
TO_DATE(start_date','YYYY-DD-MM HH24:MI:SS') + mask_util.genrnd(0, <#of days between the specified date range>)
This format is combinable at the end after data range.
-
Random Decimal Numbers
If used as part of a mixed random string, these have limited usage for generating unique values. This masking format generates unique values within the specified range. For example, a starting value of 5.5 and ending value of 9.99 generates a decimal number ranging from 5.5 to 9.99, both inclusive. This masking format is combinable.
-
Random Digits
This format generates unique values within the specified range. For example, for a random digit with a length of [5,5], an integer between [0, 99999] is randomly generated, left padded with '0's to satisfy the length and uniqueness requirement. This is a complementary type of random number, which will not be padded. When using random digits, the random digit pads to the appropriate length in a string. It does not pad when used for a number column. This format is combinable.
Data masking ensures that the generated values are unique, but if you do not specify enough digits, you could run out of unique values in that range.
-
Random Numbers
If used as part of a mixed random string, these have limited usage for generating unique values. This format generates unique values within the specified range. For example, a starting value of 100 and ending value of 200 generates an integer number ranging from 100 to 200, both inclusive. This format is combinable.
-
Random Strings
This format generates unique values within the specified range. For example, a starting length of 2 and ending length of 6 generates a random string of 2 - 6 characters in length. This format is combinable.
-
Regular Expression
The format uses a lookup table. No mapping table is created. The PL/SQL function that implements the format is invoked directly from the final CTAS which creates the masked table. The lookup table has two columns to store the regular expression and the replacement value specified by the user. The SQL
REGEXP_REPLACE
function is used to implement this format.The function has the signature:
regexp_replace(column_value, regex, replacement_val);
For example, phone numbers in the format
can be masked using a regexnnn.nnn.nnnn
with a replacement value[1-9]{3}[.][0-9]{3}[.][0-9]{4}
The format invokes the regexp_replace for each regexp-replacement value pair. If the phone number column was masked using regular expression:***.***.****
EMPID < 100 Regular Expression Regex: [1-9]{3}[.][0-9]{3}[.][0-9]{4} Replacement Value: 999.444.5555 Regular Expression Regex: [9]{3}[.][4]{3}[.][5]{4} Replacement Value: ***.***.****
Each column value matching the first regular expression is first replaced with
999.444.5555
, this value then matches the second regular expression and is replaced with***.***.****
. The example is not a real world use case. The behavior probably is a side effect of how the format is implemented, the real use case of specifying multiple regular expression formats to mask a column is to handle cases when the data in the column could match multiple regular expressions. For example:EMPID < 100 Regular Expression Regex: [1-9]{3}[.][0-9]{3}[.][0-9]{4} Replacement Value: ***.***.**** Regular Expression Regex: [1-9]{3}[.][0-9]{3}[.][0-9]{3} Replacement Value: ***.***.***
can be used to mask column values that store 10 digit - nnn.nnn.nnnn - or 9 digit - nnn.nnn.nnn - phone numbers.
-
Shuffle
This format does not use a lookup or a mapping table. The final CTAS which creates the mapping table includes a sub query to order the column contents randomly using the DBMS_RANDOM.VALUE function. If shuffle is used with a grouping column, the
PARTITION
clause is used to partition by the grouping column and the column is ordered randomly within each partition. The implementation is similar to that of Array List and Table Column. The random ordering for the shuffle format occurs on the column being “shuffle masked”, whereas in Array List, it is on the user-passed list, and in Table Column, the ordering is on the user-specified column. -
Substitute
The format creates a mapping table. It uses a user specified “substitution” table as a source for masked values. The format uses the Oracle supplied hash based partitioning function ORA_HASH to map a column value to its mask value in a lookup (substitution) table. Processing involves querying the substitution table to get a count of distinct values in the mask column. This count -
n
- is then used as themax_bucket
parameter ofORA_HASH
to hash the original column values inton
buckets. For example, if we are maskingEMPLOYEE.SALARY
and usingSUBST.SUB_COL
column as the substitution column, we first get the count of distinct values inSUB_COL
. The mapping table CTAS then queries:-
The original column,
EMPLOYEE.SALARY
-
The user provided substitution table to fetch all the distinct values in
SUBST.SUB_COL
and also fetches theROWNUM
associated with each row
The CTAS SQL then joins 1 and 2 using
ORA_HASH
and equating its output to theROWNUM
from step 2. TheSELECT
part of the CTAS SQL is listed below.max_bckt
is the count of distinct values in the substitution columnSUBST.SUB_COL
:select s.orig_val, a0.new_val from ( select orig_val from (select "SALARY" orig_val from "TESTU"."EMPLOYEE") group by orig_val) s, (select rownum rn, SUB_COL new_val from (select distinct SUB_COL from TESTU.SUBST order by SUB_COL)) a0 where ora_hash(s.orig_val, max_bckt, seed)+1 = a0.rn
-
-
SQL Expression
The format does not create a mapping table. It allows a user to use a SQL Expression for masking a column. Data masking uses this expression to generate masked values to replace the original values. The expression is invoked directly from the masking CTAS SQL. The SQL Expression can consist of one or more values, operators, and SQL functions that evaluates to a value. It can also contain substitution columns (columns from the same table as the masked column). Some examples of valid expressions:
1. dbms_random.string('u', 8) || '@company.com' 2. %first_name% || '.' || %last_name% || '@company.com'
-
Substring
The format creates a mapping table. The mapping table CTAS invokes the Oracle SUBSTR function on the input column. The format accepts a start position and length as input, extracts that data from the input column using
SUBSTR
, and uses that as a mask value. -
Table Column
The format creates a mapping table. The format maps original column values to column values in a user specified table. The processing is similar to the array list format. The values in the user specified table are randomly ordered using
DBMS_RANDOM.VALUE
before mapping each value to the original column. Unlike the Substitute format, the format is not deterministic since the substitution column is randomly ordered. -
Truncate
The format truncates all rows in a table. It does not create a mapping table. If one of the columns in a table is masked using this format, so no other mask formats can be specified for any of the other columns.
-
User Defined Function
The format creates a mapping table. The return value of the user defined function is used to mask the column. The function is invoked as part of the mapping table CTAS. The function has a fixed signature:
function userdef_func(rowid varchar2, col_name varchar2, orig_val varchar2) returns varchar2;
-
Post Processing Function
Optionally, some formats can also include a post-processing function that takes the output after executing all the format entries, runs the function and produces the final masked value.
The format allows users to use a custom function to process column values after they are masked using standard data masking formats. For example, the
SALARY
column can be masked with a SQL expression first, and a post processing function can be applied on the masked values to add a currency symbol, like ‘$’. The function has a fixed signature:function post_proc_func(rowid varchar2, column_name varchar2, mask_value varchar2) returns varchar2;
The
ROWID
input allows a user to fetch column values from the masked table. The function could use these values to mask the input column value, basically to transform the column further after a standard format is applied on the column. This format creates a mapping table. The post processing function gets invoked as part of the mapping tableCTAS SQL
. The input to the mask_value argument of the function is the masked value of the original column. For example, say we are masking theSALARY
column and the mask definition has conditions on theEMPID
column and formats are defined this way:EMPID < 100 RANDOM NUMBERS [START:100000 END: 10000000] POST PROCESSING FUNCTION ppf EMPID < 200 FIXED NUMBER 100000 DEFAULT PRESERVE ORIGINAL DATA
Note:
Refer to Frequently Asked Questions for recommended customize masking formats using above entry options.Deterministic Masking Using the Substitute Format
You may occasionally need to consistently mask multiple, distinct databases. For instance, if you run HR, payroll, and benefits that have an employee ID concept on three separate databases, the concept may be consistent for all of these databases, in that an employee's ID can be selected to retrieve the employee's HR, payroll, or benefits information. Based on this premise, if you were to mask the employee's ID because it actually contains his/her social security number, you would have to mask this consistently across all three databases.
Deterministic masking provides a solution for this problem. You can use the Substitute format to mask employee ID column(s) in all three databases. The Substitute format uses a table of values from which to substitute the original value with a mask value. As long as this table of values does not change, the mask is deterministic or consistent across the three databases.
Using the Shuffle Format
A shuffle format is available, but it does not keep the original data distribution if the column values are not unique or when conditional masking is used. For example, in an Original Table with two columns, EmpName and Salary, the Salary column might have distinct values like 10, 90, and 20.
Original Table
EmpName | Salary |
A | 10 |
B | 90 |
C | 10 |
D | 10 |
E | 90 |
F | 20 |
If you mask the Salary column with this format, each of the original values is replaced with one of the values from this set. Assume that the shuffle format replaces 10 with 20, 90 with 10, and 20 with 90.
Mapping Table (Non-preservation)
EmpName | Salary |
10 | 20 |
90 | 10 |
20 | 90 |
The result is a shuffled Salary column as shown in the Masked Table, but the data distribution is changed. While the value 10 occurs three times in the Salary column of the Original Table, it occurs only twice in the Masked Table.
Masked Table (Non-preservation)
EmpName | Salary |
A | 20 |
B | 10 |
C | 20 |
D | 20 |
E | 10 |
F | 90 |
If the salary values had been unique, the format would have maintained data distribution.
Using Group Shuffle
Group shuffle enables you to perform a shuffle within discrete units, or groups, where there is a relationship among the members of the group. Consider the case of shuffling the salaries of employees. This table illustrates the group shuffle mechanism, where employees are categorized as managers (M) or workers (W), and salaries are shuffled within job category.
Employee | Job Category | Salary | Shuffled Salary |
Alice | M | 90 | 88 |
Bill | M | 88 | 90 |
Carol | W | 72 | 70 |
Denise | W | 57 | 45 |
Eddie | W | 70 | 57 |
Frank | W | 45 | 72 |
Using Conditional Masking
To demonstrate how conditional masking can handle duplicate values, add another job category, assistant (A), where the employee in this category, George, earns the same as Frank. Assume the following conditions:
-
If job category is M, replace salary with a random number between 1 and 10.
-
If job category is W, set salary to a fixed number (01).
-
Default is to preserve the existing value.
Applying these conditions results in the masked values shown in the following table:
Employee | Job Category | Salary | Conditional Result |
Alice | M | 90 | 5 |
Bill | M | 88 | 7 |
Carol | W | 72 | 01 |
Denise | W | 57 | 01 |
Eddie | W | 70 | 01 |
Frank | W | 45 | 01 |
George | A | 45 | 45 |
Conditional masking works when there are duplicate values provided there are no dependent columns or foreign keys. If either of these is present, a "bleeding condition" results in the first of two duplicate values becoming the value of the second. So, in the example, Frank's salary is not preserved, but becomes 01.
Providing User-defined and Post-processing Functions
If desired, you can provide user-defined and post-processing functions on the Create Masking Format page. A user-defined choice is available in the Custom Format Entry list, and a post-processing function field is available at the bottom of the page.
-
User-defined functions
To provide a user-defined function, select User Defined Function from the drop-down list to access the input fields.
A user-defined function passes in the original value as input, and returns a mask value. The data type and uniqueness of the output values must be compatible with the original output values. Otherwise, a failure occurs when the job runs. Combinable, a user-defined function is a PL/SQL function that can be invoked in a
SELECT
statement. Its signature is returned as:Function udf_func (rowid varchar2, column_name varchar2, original_value varchar2) return varchar2;
-
rowid
is the min (rowid
) of the rows that contain the valueoriginal_value
3rd argument. -
column_name
is the name of the column being masked. -
original_value
is the value being masked.
That is, it accepts the original value as an input string, and returns the mask value.
Both the input and output values are varchar2. For instance, a user-defined function to mask a number could receive 100 as input, the string representation of the number 100, and return 99, the string representation of the number 99. Values are cast appropriately when inserting to the table. If the value is not castable, masking fails.
-
-
Post-processing functions
To provide a post-processing function, enter it in the Post Processing Function field.
A post-processing function has the same signature as a user-defined function, but passes in the mask value the masking engine generates, and returns the mask value that should be used for masking, as shown in the following example:
Function post_proc_udf_func (rowid varchar2, column_name varchar2, mask_value varchar2) return varchar2;
-
rowid
is the min (rowid) of the rows that contain the valuemask_value
. -
column_name
is the name of the column being masked. -
mask_value
is the value being masked.
-
Patterns of Format Definitions
All of the predefined format definitions adhere to these typical patterns:
-
Generate a random number or random digits.
-
Perform post-processing on the above-generated value to ensure that the final result is a valid, realistic value.
For example, a valid credit card number must pass Luhn's check. That is, the last digit of any credit card number is a checksum digit, which is always computed. Also, the first few digits indicate the card type (MasterCard, Amex, Visa, and so forth). Consequently, the format definition of a credit card would be as follows:
-
Generate random and unique 10-digit numbers.
-
Using a post-processing function, transform the values above to a proper credit card number by adding well known card type prefixes and computing the checksum digit.
This format is capable of generating 10 billion unique credit card numbers.
Masking Definitions
A masking definition defines a data masking operation to be implemented on one or more tables in a database. Masking definitions associate table columns with masking formats to use for masking the data.
Prerequisites
Before creating a masking definition, review requirements and advisory information here: Prerequisites
-
Ensure the format you select does not violate check constraints and does not break any applications that use the data.
-
For triggers and PL/SQL packages, data masking recompiles the object.
-
Exercise caution when masking partitioned tables, especially if you are masking the partition key. In this circumstance, the row may move to another partition.
-
Data Masking does not support clustered tables, masking information in object tables, XML tables, and virtual columns. Relational tables are supported for masking.
-
If objects are layered on top of a table such as views, materialized views, and PL/SQL packages, they are recompiled to be valid.
If you plan to mask a test system intended for evaluating performance, try to preserve the production statistics and SQL profiles after masking by adding a pre-masking script to export the SQL profiles and statistics to a temporary table, then restoring after masking completes.
Creating a Masking Definition
-
From the Targets menu, select Databases, then click the Security menu and navigate to Data Masking and Subsetting then Data Masking. Alternatively, you can right-click on your specific database and go to Security then Data Masking and Subsetting then Data Masking.
The Masking Definitions page appears, where you can create new or manage existing masking definitions.
-
Click Create to go to the Create Masking Definition page.
A masking definition includes information regarding table columns and the format for each column. You can choose which columns to mask, leaving the remaining columns intact.
-
Provide required details, including Name, Application Data Models, Associated Database, and Database Named Credentials.
-
Click Next to go to the Sensitive Columns and Masking Formats page, where you can choose which sensitive columns in the ADM you want to mask.
The sensitive columns appear in the first section whereas columns added to Masking Definition appear below in the next section.
-
Select the desired sensitive columns you want to mask.
Either click Define Format and Add to define the format for the column now or you can choose Add Column to add the column to the Masking Definition and define the format later.
Also, Define Format and Add features can save significant time. When you select multiple columns to add that have the same data type, you do not need to define the format for each column one by one. For instance, if you search for Social Security numbers (SSN) and the search yields 100 SSN columns, you could select them all, then click Define Format and Add to import the SSN format for all of them.
Optionally, below are the other masking options available:- Group Masking: If you want to mask selected columns as a group, enable Group Masking. The columns that you want to mask as a group must all be from the same table. Any formats defined while this checkbox is selected will be applied to all the selected columns as a group.
- Select all filtered rows: Enable the Select all filtered
rows check box if you want to mask more than one column together
rather than separately.
After you define the masking format for the grouped columns and return to this page, the Column Group in the table shows an identical number for each entry row in the table for all members of the group. For example, if you have defined your first group containing four columns, each of the four entries in this page will show a number 1 in the Column Group column. This helps you to distinguish which columns belong to which column groups.
-
Do one of the following:
- If you chose Add Column in the previous step and do not
wish to provide the masking format for now:
You will eventually need to define the column's masking format. When you are ready to provide the masking format, go to the Actions menu for the Masking Definition and select Edit to define the format. Read the following instructions to define the formats.
- If you chose Define Format
and
Add in the previous step:
The Define Format and Add page appears, where you can define the format for all selected columns, as explained below:
- Choose the masking format either from the predefined masking formats drop-down or customize your own. Click Import.
- You can provide one or more format entries.
- You can generate Sample Data by clicking Generate.
- When you have finished formatting the column, click Add. The sensitive columns you selected and assigned masking formats to earlier now appear in the next section. Click Next.
- If you chose Add Column in the previous step and do not
wish to provide the masking format for now:
-
Review the Data Masking options. Before clicking Next, you can also provide the pre- and post-masking script. Refer to Selecting Data Masking Advanced Options for more information regarding Pre Mask Scripts and Post Mask Scripts.
-
Review the final page and click Create to return to the Masking Definitions page. The new masking definition will appear at the top of the table.
Selecting Data Masking Advanced Options
Data Masking Options
The data masking options include:
-
Disable redo log generation during masking:
Masking disables redo logging and flashback logging to purge any original unmasked data from logs. However, in certain circumstances when you only want to test masking, roll back changes, and retry with more mask columns, it is easier to uncheck this box and use a flashback database to retrieve the old unmasked data after it has been masked. You can use Enterprise Manager to flashback a database.
Note:
Disabling this option compromises security. You must ensure this option is enabled in the final mask performed on the copy of the production database.
-
Refresh statistics after masking:
If you have already enabled statistics collection and would like to use special options when collecting statistics, such as histograms or different sampling percentages, it is beneficial to turn off this option to disable default statistics collection and run your own statistics collection jobs.
-
Drop temporary tables created during masking:
Masking creates temporary tables that map the original sensitive data values to mask values. In some cases, you may want to preserve this information to track how masking changed your data. Note that doing so compromises security. These tables must be dropped before the database is available for unprivileged users.
-
Decrypt encrypted columns:
This option decrypts columns that were previously masked using Encrypt format. To decrypt a previously encrypted column, the seed value must be the same as the value used to encrypt.
Decrypt only recovers the original value if the original format used for the encryption matches the original value. If the originally encrypted value did not conform to the specified regular expression, when decrypted, the encrypted value cannot reproduce the original value.
-
Use parallel execution when possible:
Oracle Database can make parallel various SQL operations that can significantly improve their performance. Data Masking uses this feature when you select this option. You can enable Oracle Database to automatically determine the degree of parallelism, or you can specify a value. For more information about using parallel execution and the degree of parallelism, see the Oracle Database Data Warehousing Guide.
-
Recompile invalid dependent objects after masking:
The masking process re-creates the table to be masked and as a consequence, all existing dependent objects (packages, procedures, functions, MViews, Views, Triggers) become invalid. You can specify that the masking process recompile these invalid objects after creating the table, by selecting the check box. Otherwise, invalid objects are not recompiled using
utl_comp
procedures at the end of masking.If you choose this option, indicate whether to use serial or parallel execution. You can enable Oracle Database to automatically determine the degree, or you can specify a value. For more information about using parallel execution and the degree of parallelism, see the Oracle Database Data Warehousing Guide.
Random Number Generator
The random number generation options include:
-
Favor Speed:
The
DBMS_RANDOM
package is used for random number generation. -
Favor Security:
The
DBMS_CRYPTO
package is used for random number generation. Additionally, if you use the Substitute format, a seed value is required when you schedule the masking job or database clone job.
Pre Mask and Post Mask Scripts
Pre and post masking scripts are free flow scripts where a user can execute SQL statements and PL/SQL stored program units (procedures, functions, packages, anonymous block) before and after masking as required.
Use the Pre Mask Script text box to specify any user-specified SQL script that must run before masking starts.
Use the Post Mask Script text box to specify any user-specified SQL script that must run after masking completes.
Generating Masking Script
To schedule a script generation job:
- Select the masking definition to generate a script for, then click on the Actions button and select Manage Masking Script then Generate Masking Script.
- Select a Data Masking option:
-
In-Database Masking – to replace sensitive data in-place with masked data on a specified database (usually copied from production).
-
In-Export Masking – to export masked data from the specified source database (usually production) using Oracle Data Pump.
-
You can choose both options; that is, a script to mask the database directly and a script to create a masked dump.
-
- Select an Associated Database and Database Named Credentials from the drop-down list.
- Change the default job name to something meaningful, if desired, and provide an optional job description.
- Specify to start the job immediately or at a later specified date and
time, then click Generate.
A message confirms that the Generate script job has been submitted successfully. Click on the Refresh button to see the latest job status.
Scheduling Masking Job
To set up the data masking job and schedule its execution:
- Select the masking definition for which a script has been generated, then click Actions and select Schedule Masking.
- Select a Data Masking Option radio button to indicate your
preference:
-
In-Database Masking – Replaces sensitive data directly in the specified database (usually a copy from production) with masked data. This option is intended for non-production environments only.
Note:
You must enable the check box indicating that the selected database is not a production database in order to proceed. -
In-Export Masking – Exports masked data from the specified source database (usually production) using Oracle Data Pump. This option is safe for production environments, as it does not modify customer data. However, it creates temporary tables that will be dropped once the masking operation is completed. Your selection will update the checkbox text below the radio buttons and other areas on the page.
-
- From the dropdown menus, select the required Associated Database, Database Named Credentials, and Host Named Credentials.
- Choose a Tablespace for Temporary Objects. There are three options:
- Default Tablespace
- Temporary Tablespace
- Custom Tablespace
Select the appropriate option based on your needs.
- Enter a Job Name or accept the default pre-filled value.
- Specify whether to start the job immediately or at a later date and time. Then, click Next.
- If you selected In-Database Masking in Step 2:
- Enter the script file location and name (pre-filled), and then click Submit.
- If you selected In-Export Masking in Step 2:
- Enter the Export Filename (pre-filled).
- Specify a directory to save the mask dump files to by selecting an option from Type of Export Directory. The available options are a Database Directory, Custom Directory or an External Directory.
- Select whether to export the masked data only or the entire database along with the masked data through Data to Export. Exporting the entire database might significantly increase the overall job execution time.
- Enter the Maximum Export File Size (MB) (pre-filled). Optionally, choose to Enable Export File Compression and/or Enable Export File Encryption according to your needs. If Export File Encryption is enabled, enter and confirm an encryption password.
- Specify the Maximum Number of Threads (pre-filled).
- Generate Data Dump Logs is enabled by default but you can choose to turn it off as necessary.
- Click Submit.
- A message confirms that the job has been scheduled and you can use the refresh button above the table to update the job status.